[C + +] virtual function and its memory layout

1, Function call bundling

Associating a function body with a function call is called bundling.

When bundling is completed before the program runs (by the compiler and connector), it is called early bundling. There is only one function call method for C compilation, that is, early binding.

The problem caused by early binding: because the compiler does not know the correct function to call when it only has the address of the object.

According to the type of object, bundling occurs at runtime. This bundling method is called late bundling, also known as dynamic bundling.

II   virtual function

For specific functions, in order to cause late binding, C + + requires that the virtual keyword be used when declaring this function in the base class. Such functions are called virtual functions. Late binding works only on virtual functions and occurs only when the address of the base class containing the virtual function is used.

Virtual functions only need to use the keyword virtual when declared, but not when defined. If a function is declared virtual in the base class, it is virtual in all derived classes. Redefinition of virtual functions in derived classes is called overriding.

[note] as long as a function is declared as virtual in the base class, the virtual mechanism will be used to call all derived class functions matching the behavior declared by the base class.

Example demonstration:

class A
{
private:
	int i;
public:
	virtual void play()const
	{
		cout << "A:play" << endl;
	}

};

class B:public A
{
private:
	int j;
public:
	void play()const
	{
		cout << "B:play" << endl;
	}
};

class C :public B
{
public:
	void play()const
	{
		cout << "C:play" << endl;
	}
};

int main()
{
	B b;
	A a;
	b.play();
	a.play();
	A* ptr = &b;
	ptr->play();
	C c;
	A* ptr2 = &c;
	ptr2->play();
}

  Output results:

  The results show that although B's play() does not declare virtual, it inherits A's play(), so the virtual mechanism continues.

3, C + + implementation late binding (how the compiler handles virtual functions)

The compiler creates a virtual table (VTABLE) for each class that contains virtual functions. Place the virtual function address of a specific class in the virtual table. In each class with a virtual function, the compiler secretly places a pointer, called the virtual pointer vpointer (VPTR), to the VTABLE of the object.

When making a virtual function call through the base class pointer, the compiler statically inserts the code that can obtain the VPTR and find the function address in the VTABLE table, so that the correct function can be called and late binding can be triggered.

class A
{
private:
	int i;
public:
	void fun1() {}
	void fun2() {}
};

class B 
{
private:
	int i;
public:
	virtual void fun1() {}
	void fun2() {}
};

class C 
{
private:
	int i;
public:
	virtual void fun1() {}
	virtual void fun2() {}
};

int main()
{
	A a;
	B b;
	C c;
	cout << "sizeof(a):" << sizeof(a) << endl;    //4
	cout << "sizeof(b):" << sizeof(b) << endl;    //8
	cout << "sizeof(c):" << sizeof(c) << endl;    //8
}

As can be seen from the above example, without virtual function, the length of object a is exactly the length of member object: int 4 bytes. For object b with a single virtual function, the object length is the length without a virtual function + the length of a void pointer: 4 + 4 = 8 bytes. The length of two virtual functions is also 8 bytes, reflecting that no matter one or more virtual functions, the compiler will only insert a virtual pointer vptr (because vptr points to a table for storing function addresses, only one table can store all virtual function addresses).

Virtual function mechanism

class A
{
private:
	int i;
public:
	virtual void func1()const
	{
		cout << "A:func1" << endl;
	}
	virtual void func2()const
	{
		cout << "A:func2" << endl;
	}
	virtual void func3()const
	{
		cout << "A:func3" << endl;
	}


};

class B :public A
{
private:
	int j;
public:
	void func1()const
	{
		cout << "B:func1" << endl;
	}
	void func2()const
	{
		cout << "B:func2" << endl;
	}
	void func3()const
	{
		cout << "B:func3" << endl;
	}
};

class C :public A
{
public:
	void func1()const
	{
		cout << "C:func1" << endl;
	}
	void func2()const
	{
		cout << "C:func2" << endl;
	}
	void func3()const
	{
		cout << "C:func3" << endl;
	}
};

int main()
{
	B b;
	C c;
	A* ptr[] = { &b,&c };
	ptr[0]->func1();
	ptr[1]->func1();
}

  As can be seen from the above figure, whenever a class containing virtual functions is created or a class is derived from a class containing virtual functions, the compiler creates a unique VTABLE for this class, in which the address of the function declared as virtual in this class or its base class is placed. If the function declared as virtual in the base class is not redefined in the derived class, the compiler uses the virtual function address of the base class.

 

When using simple inheritance, there is only one vptr for each object, and the vptr must be initialized to point to the starting address of the corresponding VTABLE. Once vptr is initialized to point to the corresponding VTABLE, the object knows what type it is. But this self-awareness is useful only when virtual functions are called.

Special handling is required when calling a virtual function with a base class pointer. It does not implement a typical function CALL (through the specific address of assembly language CALL), but requires the compiler to generate different code to complete this function CALL. In fact, this function is called through "the offset of VTABLE where vptr + function is located". Because getting the vptr and determining the actual function address occurs at runtime, you get the desired late binding.

Virtual function memory layout

To view the layout of virtual functions in memory, you can use the Visual Studio compiler.

Step 1: select the file you want to edit and right-click "properties".

  Step 2: edit other options on the C/C + + command line

  reportAllClassLayout means to generate the memory layout of all classes. You can also use reportSingleClassLayout[classname] to generate the specified class memory layout.

Step 3: click generate and select generate solution to view the corresponding memory layout

Example demonstration

Example 1: (general class)

#include <iostream>
using namespace std;

class Base
{
    int a;
    int b;
public:
    void CommonFunction();
};

int main()
{
    cout << sizeof(Base) << endl;  //8
}

  Layout generated after compilation:

 

It can be seen from the above example that the memory layout of common classes is that member variables are arranged in the order of declaration, and member functions do not occupy memory space. sizeof(a)+sizeof(b)=4+4=8.

Example 2: (derived class)

class DerivedClass : public Base
{
    int c;
public:
    void DerivedCommonFunction();
};

int main()
{
    cout << sizeof(DerivedClass) << endl;//12
}

  Layout generated after compilation:

  For a derived class, you can first see that the derived class inherits the member variables of the base class. In the memory layout, first arrange the member variables of the base class, and then arrange the member variables of the derived class. The member functions still do not occupy memory space. sizeof(a)+sizeof(b)+sizeof(c)=4+4+4=12

Example 3: (class of a single virtual function)

class Base
{
    int a;
    int b;
public:
    void CommonFunction();
    virtual void VirtualFunction();
};

   Layout generated after compilation:

The above memory structure diagram is divided into two parts: memory distribution and virtual table. In the memory distribution diagram, the vs compiler places the virtual pointer (vfptr) at the starting position (the position of 0 address offset), and then stores the member variables. The virtual table pointed to by the virtual pointer follows & base_ The 0 after meta indicates that the 0 on the left is the sequence number of the virtual function (0 indicates the first one. If there are multiple, they can be encoded in turn). The compiler creates this virtual pointer and virtual table in the constructor. sizeof(void*)+sizeof(a)+sizeof(b)=4+4+4=12.

Example 4: (classes of multiple virtual functions)

class Base
{
    int a;
    int b;
public:
    void CommonFunction();
    virtual void VirtualFunction();
    virtual void VirtualFunction2();
};

class DerivedClass : public Base
{
    int c;
public:
    void DerivedCommonFunction();
    void VirtualFunction();
};

 

  From the memory structure diagram of the upper base class, it can be seen that multiple virtual functions have only one virtual pointer, and multiple virtual functions are placed in the same virtual table.

  Through the memory structure diagram of the derived class, it can be seen that the derived class itself will not generate a virtual pointer again, but inherit the virtual pointer of the base class.

Example 5: (derived classes contain virtual functions)

class DerivedClass : public Base
{
    int c;
public:
    void DerivedCommonFunction();
    virtual void VirtualFunction3();
};

From the memory structure diagram of the derived class above, it can be seen that the virtual pointer is inherited and is still at the beginning of the memory arrangement, then the base class member variable, and then the derived class member variable. The function address arrangement of the virtual table is to arrange the virtual function address of the base class first, and then the virtual function of the derived class.

Example 6: (multiple inheritance)

class Base
{
    int a;
    int b;
public:
    void CommonFunction();
    void virtual VirtualFunction();
};


class DerivedClass1 : public Base
{
    int c;
public:
    void DerivedCommonFunction();
    void virtual VirtualFunction();
};

class DerivedClass2 : public Base
{
    int d;
public:
    void DerivedCommonFunction();
    void virtual VirtualFunction();
};

class DerivedDerivedClass : public DerivedClass1, public DerivedClass2
{
    int e;
public:
    void DerivedDerivedCommonFunction();
    void virtual VirtualFunction();
};

int main()
{

}

  Base, DerivedClass1 and DerivedClass2 are similar to those described above. Focus on DerivedDerivedClass

 

  As can be seen from the above memory structure diagram, the member variables e of DerivedClass1, DerivedClass2 and DerivedDerivedClass are side by side, the layout starts from the starting position of DerivedClass1, and DerivedClass1 contains virtual pointers inherited from Base, member variables a,b and its own member variables c;DerivedClass2 starts the layout after DerivedClass1. DerivedClass2 also has an independent Base, including virtual pointers inherited from Base, member variables A and b and its own member variable d.

At the same time, two virtual pointers point to two virtual tables respectively. The vfptr offset of DerivedClass1 is 0, and - 16 represents the memory offset of vfptr of DerivedClass2.

Example 7: (virtual inheritance and multi inheritance mixing)

class DerivedClass1: virtual public Base
{
    int c;
public:
    void DerivedCommonFunction();
    void virtual VirtualFunction();
};

class DerivedClass2 : virtual public Base
{
    int d;
public:
    void DerivedCommonFunction();
    void virtual VirtualFunction();
};

class DerivedDerivedClass :  public DerivedClass1, public DerivedClass2
{
    int e;
public:
    void DerivedDerivedCommonFunction();
    void virtual VirtualFunction();
};

  Note: DerivedClass1 has changed. DerivedClass1 creates its own virtual pointer vbptr and inherits the virtual pointer vfptr of the base class. Vbptr points to the virtual table vbtable,vfptr points to the virtual table vftable. 8 in the virtual table vbtable indicates the offset between vbptr and vfptr, and - 8 in the virtual table vftable indicates the offset of the virtual pointer of the table in memory.

In addition, the vbprt and member variable c of DerivedClass1 are arranged in front of the inherited Base class Base.

DerivedClass2 is similar to DerivedClass1.

There are three pointers in DerivedDerivedClass, one is vbptr inherited from DerivedClass1, the other is vbptr inherited from DerivedClass2, and a vfptr of Base is reserved.

The function of virtual inheritance is to reduce the repetition of base classes at the cost of increasing the burden of virtual table pointers (there are more virtual table pointers).

3, Abstract base classes and pure virtual functions

When designing, you often want the base class to be only an interface to its derived classes, rather than the user actually creating an object of the base class. To do this, you can add at least one pure virtual function to the base class to make the base class an abstract class. Pure virtual functions use the keyword virtual followed by = 0. If you try to generate an object of an abstract class, the compiler will stop the behavior.

When inheriting an abstract class, all pure virtual functions must be implemented, otherwise the derived class will also be an abstract class. Creating a pure virtual function allows you to place member functions in the interface without having to provide a piece of code that may be meaningless to the function.

The only reason for establishing a common interface is that it can have a different representation for each different subclass. When you want to manipulate a set of classes through a public interface, and the public interface does not need to be implemented (or fully implemented), you can create an abstract class.

Pure virtual function syntax format:

virtual Return value type function name (Function parameters) = 0;

A pure virtual function has no function body but only a function declaration. Adding = 0 to the end of the virtual function declaration indicates that the function is a pure virtual function.

=0 does not mean that the return value of the function is 0. It only plays a formal role and tells the compiler that this is a pure virtual function.

Classes containing pure virtual functions are called abstract classes. It is called an abstract class because it cannot be instantiated. The reason is that pure virtual functions have no function body, are not complete functions, and cannot be called or allocated memory space.

Examples of pure virtual functions:

#include <iostream>
using namespace std;

//Line
class Line{
public:
    Line(float len);
    virtual float area() = 0;
    virtual float volume() = 0;
protected:
    float m_len;
};
Line::Line(float len): m_len(len){ }

//rectangle
class Rec: public Line{
public:
    Rec(float len, float width);
    float area();
protected:
    float m_width;
};
Rec::Rec(float len, float width): Line(len), m_width(width){ }
float Rec::area(){ return m_len * m_width; }

//Cuboid
class Cuboid: public Rec{
public:
    Cuboid(float len, float width, float height);
    float area();
    float volume();
protected:
    float m_height;
};
Cuboid::Cuboid(float len, float width, float height): Rec(len, width), m_height(height){ }
float Cuboid::area(){ return 2 * ( m_len*m_width + m_len*m_height + m_width*m_height); }
float Cuboid::volume(){ return m_len * m_width * m_height; }

//cube
class Cube: public Cuboid{
public:
    Cube(float len);
    float area();
    float volume();
};
Cube::Cube(float len): Cuboid(len, len, len){ }
float Cube::area(){ return 6 * m_len * m_len; }
float Cube::volume(){ return m_len * m_len * m_len; }

int main(){
    Line *p = new Cuboid(10, 20, 30);
    cout<<"The area of Cuboid is "<<p->area()<<endl;
    cout<<"The volume of Cuboid is "<<p->volume()<<endl;
  
    p = new Cube(15);
    cout<<"The area of Cube is "<<p->area()<<endl;
    cout<<"The volume of Cube is "<<p->volume()<<endl;

    return 0;
}

The inheritance relationship of these four classes is: Line -- > rec -- > cuboid -- > cube.

Line is an abstract class in which two pure virtual functions area() and volume() are defined.

In the Rec class, the area() function is implemented; The so-called implementation is to define the function body of pure virtual function. However, Rec cannot be instantiated at this time because it does not implement the inherited volume() function. volume() is still a pure virtual function, so Rec is still an abstract class.

The Cuboid class does not implement the volume() function until it is a complete class and can be instantiated.

It can be found that the Line class represents "Line", without area and volume, but it still defines two pure virtual functions: area() and volume(). The intention is obvious: Line class does not need to be instantiated, but it provides "constraints" for derived classes. Derived classes must implement these two functions to complete the function of calculating area and volume, otherwise they cannot be instantiated.

In actual development, you can define an abstract base class, which only completes some functions, and the unfinished functions are handed over to the derived class for implementation (who derives and who implements). These unfinished functions are often unnecessary or impossible to implement in the base class. Although the abstract base class is not completed, it is mandatory for the derived class to complete. This is the "overlord clause" of the abstract base class.

In addition to constraining the functions of derived classes, abstract base classes can also implement polymorphism. Note that the code Line *p = new Cuboid(10, 20, 30). The type of pointer p is Line, but it can access the area() and volume() functions in the derived class. It is precisely because these two functions are defined as pure virtual functions in the Line class; If you don't, the following code is wrong. This is the main purpose of C + + to provide pure virtual functions (that is, to enable the code operating on the base class object to operate the derived class object transparently).

[note] pure virtual functions are forbidden to call functions of abstract classes by passing values, which is also a method to prevent object slicing. Abstract classes ensure that pointers or references are always used during upward type conversion.

Object slice

If you type up an object instead of using an address or reference, an object slice is sent.

#include <iostream>
#include <string>
using namespace std;

class Pet
{
private:
	string pname;
public:
	Pet(string name):pname(name){}
	virtual string getName()const { return pname; }
	virtual void description()const 
	{ 
		cout << "The pet's name is" << getName() << endl;
	}
};

class Dog:public Pet
{
private:
	string actor;
public:
	Dog(const string& name,const string& dactor) :Pet(name),actor(dactor) {}
	virtual void description()const
	{
		cout << "The pet's name is" << Pet::getName() << endl;
		cout << "The Dog likes to" << actor << endl;
	}
};

void descript(Pet p)
{
	p.description() ;
}

int main()
{
	Pet p("Alpha");
	Dog d("BigYellow", "eat");
	descript(p);
	descript(d);
}

It can be seen from the results in the figure above that the descrit of the base class Pet is called twice. This is not the result we want. The result of the second call we want is "The dog's name is BigYellow""The dog likes to eat".

The reason for this result is that the derived class object d is passed description (), and the compiler will receive it, but only copy the part of the object corresponding to Pet and cut off the derived part of the object.

 

Keywords: C++

Added by capslock118 on Mon, 18 Oct 2021 23:56:36 +0300