C ා generics - unpacking and type safety, Java generics are syntactic sugar

Now, netcore 3.1 and the latest. Netframework 8 have not been criticized as the ArrayList, but it's a coincidence that this thing has to be said, because it determines that the C ා team changes its ways, abandons the past and starts again, and the last ArrayList case code.

    public class ArrayList
        private object[] items; private int index = 0; public ArrayList() { items = new object[10]; } public void Add(object item) { items[index++] = item; } } 

In order to ensure that various types of eg: int,double,class can be inserted into the Add, the above code comes up with a unique skill to receive with the ancestral class object, which introduces two major problems: packing, unpacking and type safety.

1. Packing and unpacking

This is easy to understand. Because you use a ancestor class, when you Add, you insert a value type, and there will be boxing operations, such as the following code:

            ArrayList arrayList = new ArrayList();

<1> Take up more space

I'm going to use windbg to see this problem. I'm sure you know that an int type takes up 4 bytes. How many bytes are packed on the heap? Be curious 😄.

The original code and IL code are as follows:

        public static void Main(string[] args) { var num = 10; var obj = (object)num; Console.Read(); } IL_0000: nop IL_0001: ldc.i4.s 10 IL_0003: stloc.0 IL_0004: ldloc.0 IL_0005: box [mscorlib]System.Int32 IL_000a: stloc.1 IL_000b: call int32 [mscorlib]System.Console::Read() IL_0010: pop IL_0011: ret 

You can see clearly that there is a box instruction in il_, there is no problem in packing, and then grab the dump file.

~0s -> !clrstack -l -> !do 0x0000018300002d48

0:000> ~0s
00007ff9`fc7baa64 c3 ret 0:000> !clrstack -l OS Thread Id: 0xfc (0) Child SP IP Call Site 0000002c397fedf0 00007ff985c808f3 ConsoleApp2.Program.Main(System.String[]) [C:\dream\Csharp\ConsoleApp1\ConsoleApp2\Program.cs @ 28] LOCALS: 0x0000002c397fee2c = 0x000000000000000a 0x0000002c397fee20 = 0x0000018300002d48 0000002c397ff038 00007ff9e51b6c93 [GCFrame: 0000002c397ff038] 0:000> !do 0x0000018300002d48 Name: System.Int32 MethodTable: 00007ff9e33285a0 EEClass: 00007ff9e34958a8 Size: 24(0x18) bytes File: C:\WINDOWS\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll Fields: MT Field Offset Type VT Attr Value Name 00007ff9e33285a0 40005a0 8 System.Int32 1 instance 10 m_value 

Size: 24(0x18) bytes. You can see that it is 24 bytes. Why is it 24 bytes, 8 (synchronous block pointer) + 8 (method table pointer) + 4 (object size) = 20, but because it's x64 bit, memory is aligned by 8, that is to say, it's multiple of 8, so the occupation is 8 + 8 + 8 = 24 bytes, so it's only 4 bytes, because the packing has been exploded to 24 bytes, if it's 10000 value type packing, is the occupation of space quite terrible?

<2> Stack to stack packing, transportation, after-sale and harmless treatment all need to pay heavy labor and machine costs

2. Unsafe type

It's very simple. Because it's the ancestral type object, programmers can't avoid using messy types. Of course, this may be unintentional, but the compiler can't avoid it. The code is as follows:

            ArrayList arrayList = new ArrayList();
            arrayList.Add(new Action<int>((num) => { }));
            arrayList.Add(new object()); 

In the face of these two awkward problems, the C ා team decided to redesign a type to achieve a certain lifetime, which has generics.

2: The emergence of generics

1. Savior

First of all, generics are generated to solve these two problems. You can use list < int > and list < double > in the list < T > provided at the bottom... Wait for the type you can see, and the underlying implementation principle of this technology is the focus of this article.

        public static void Main(string[] args) { List<double> list1 = new List<double>(); List<string> list3 = new List<string>(); ... } 

3: On the general principles

The exploration of this problem is actually where list < T > - > List < int > implements T - > int In contrast to java, its generic implementation is actually replaced by object at the bottom. C ා certainly does not. Otherwise, this article is not available. To know which stage is replaced, you need to know at least several stages of C # code compilation. For the convenience of understanding, I'll draw a picture.

As you can see, the process is either replaced in MSIL or JIT compilation...

        public static void Main(string[] args) { List<double> list1 = new List<double>(); List<int> list2 = new List<int>(); List<string> list3 = new List<string>(); List<int[]> list4 = new List<int[]>(); Console.ReadLine(); } 

1. Explore in the first stage

Because the first phase is MSIL code, use ILSpy to look at the intermediate code.

		IL_0000: nop
		IL_0001: newobj instance void class [mscorlib]System.Collections.Generic.List`1<float64>::.ctor() IL_0006: stloc.0 IL_0007: newobj instance void class [mscorlib]System.Collections.Generic.List`1<int32>::.ctor() IL_000c: stloc.1 IL_000d: newobj instance void class [mscorlib]System.Collections.Generic.List`1<string>::.ctor() IL_0012: stloc.2 IL_0013: newobj instance void class [mscorlib]System.Collections.Generic.List`1<int32[]>::.ctor() IL_0018: stloc.3 IL_0019: call string [mscorlib]System.Console::ReadLine() IL_001e: pop IL_001f: ret .class public auto ansi serializable beforefieldinit System.Collections.Generic.List`1<T> extends System.Object implements class System.Collections.Generic.IList`1<!T>, class System.Collections.Generic.ICollection`1<!T>, class System.Collections.Generic.IEnumerable`1<!T>, System.Collections.IEnumerable, System.Collections.IList, System.Collections.ICollection, class System.Collections.Generic.IReadOnlyList`1<!T>, class System.Collections.Generic.IReadOnlyCollection`1<!T> 

As can be seen from the above IL code, the final class definition is system. Collections. Generic. LIST1 \ < T >, indicating that T - > int replacement is not implemented in the intermediate code stage.

2. Explore in the second stage

If you want to see JIT compiled code, it's not hard to say that. Actually, there is a method table pointer on the head of each object, and this pointer points to the method table. There are all the final generation methods of this type in the method table. If it's not easy to understand, I'll draw a diagram.

! dumpheap -stat looks for four List objects on the managed heap.

0:000> !dumpheap -stat
              MT    Count    TotalSize Class Name
00007ff9e3314320        1           32 Microsoft.Win32.SafeHandles.SafeViewOfFileHandle 00007ff9e339b4b8 1 40 System.Collections.Generic.List`1[[System.Double, mscorlib]] 00007ff9e333a068 1 40 System.Collections.Generic.List`1[[System.Int32, mscorlib]] 00007ff9e3330d58 1 40 System.Collections.Generic.List`1[[System.String, mscorlib]] 00007ff9e3314a58 1 40 System.IO.Stream+NullStream 00007ff9e3314510 1 40 Microsoft.Win32.Win32Native+InputRecord 00007ff9e3314218 1 40 System.Text.InternalEncoderBestFitFallback 00007ff985b442c0 1 40 System.Collections.Generic.List`1[[System.Int32[], mscorlib]] 00007ff9e338fd28 1 48 System.Text.DBCSCodePageEncoding+DBCSDecoder 00007ff9e3325ef0 1 48 System.SharedStatics 

You can see that four list objects have been found in the managed heap. Now I will select the simplest System.Collections.Generic.List1[[System.Int32, mscorlib]], and the previous 00007ff9e333a068 is the method table address.

!dumpmt -md 00007ff9e333a068

0:000> !dumpmt -md 00007ff9e333a068
EEClass:         00007ff9e349b008
Module:          00007ff9e3301000 Name: System.Collections.Generic.List`1[[System.Int32, mscorlib]] mdToken: 00000000020004af File: C:\WINDOWS\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll BaseSize: 0x28 ComponentSize: 0x0 Slots in VTable: 77 Number of IFaces in IFaceMap: 8 -------------------------------------- MethodDesc Table Entry MethodDesc JIT Name 00007ff9e3882450 00007ff9e3308de8 PreJIT System.Object.ToString() 00007ff9e389cc60 00007ff9e34cb9b0 PreJIT System.Object.Equals(System.Object) 00007ff9e3882090 00007ff9e34cb9d8 PreJIT System.Object.GetHashCode() 00007ff9e387f420 00007ff9e34cb9e0 PreJIT System.Object.Finalize() 00007ff9e38a3650 00007ff9e34dc6e8 PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].Add(Int32) 00007ff9e4202dc0 00007ff9e34dc7f8 PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].Insert(Int32, Int32) 

There are too many methods in the above method table. I have done some pruning. It can be seen clearly that the Add method has accepted the data of type . It means that after JIT compilation, the replacement of T - > int is finally implemented, and then the list < double > is typed out.

0:000> !dumpmt -md 00007ff9e339b4b8
MethodDesc Table
           Entry       MethodDesc    JIT Name
00007ff9e3882450 00007ff9e3308de8 PreJIT System.Object.ToString() 00007ff9e389cc60 00007ff9e34cb9b0 PreJIT System.Object.Equals(System.Object) 00007ff9e3882090 00007ff9e34cb9d8 PreJIT System.Object.GetHashCode() 00007ff9e387f420 00007ff9e34cb9e0 PreJIT System.Object.Finalize() 00007ff9e4428730 00007ff9e34e4170 PreJIT System.Collections.Generic.List`1[[System.Double, mscorlib]].Add(Double) 00007ff9e3867a00 00007ff9e34e4280 PreJIT System.Collections.Generic.List`1[[System.Double, mscorlib]].Insert(Int32, Double) 

All of the above are value types. Let's see what happens if T is a reference type?

0:000> !dumpmt -md 00007ff9e3330d58
MethodDesc Table
           Entry       MethodDesc    JIT Name
00007ff9e3890060 00007ff9e34eb058 PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon) 0:000> !dumpmt -md 00007ff985b442c0 MethodDesc Table Entry MethodDesc JIT Name 00007ff9e3890060 00007ff9e34eb058 PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon) 

We can see that when it's list < int [] > and list < string >, JIT uses the type of system. \\\\\\\\\\\\\\\\ Memory address: 00007ff9e389060, which is a piece of assembly printed out.

0:000> !u 00007ff9e3890060
preJIT generated code
System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon)
Begin 00007ff9e3890060, size 4a >>> 00007ff9`e3890060 57 push rdi 00007ff9`e3890061 56 push rsi 00007ff9`e3890062 4883ec28 sub rsp,28h 00007ff9`e3890066 488bf1 mov rsi,rcx 00007ff9`e3890069 488bfa mov rdi,rdx 00007ff9`e389006c 8b4e18 mov ecx,dword ptr [rsi+18h] 00007ff9`e389006f 488b5608 mov rdx,qword ptr [rsi+8] 00007ff9`e3890073 3b4a08 cmp ecx,dword ptr [rdx+8] 00007ff9`e3890076 7422 je mscorlib_ni+0x59009a (00007ff9`e389009a) 00007ff9`e3890078 488b4e08 mov rcx,qword ptr [rsi+8] 00007ff9`e389007c 8b5618 mov edx,dword ptr [rsi+18h] 00007ff9`e389007f 448d4201 lea r8d,[rdx+1] 00007ff9`e3890083 44894618 mov dword ptr [rsi+18h],r8d 00007ff9`e3890087 4c8bc7 mov r8,rdi 00007ff9`e389008a ff152088faff call qword ptr [mscorlib_ni+0x5388b0 (00007ff9`e38388b0)] (JitHelp: CORINFO_HELP_ARRADDR_ST) 00007ff9`e3890090 ff461c inc dword ptr [rsi+1Ch] 00007ff9`e3890093 4883c428 add rsp,28h 00007ff9`e3890097 5e pop rsi 00007ff9`e3890098 5f pop rdi 00007ff9`e3890099 c3 ret 00007ff9`e389009a 8b5618 mov edx,dword ptr [rsi+18h] 00007ff9`e389009d ffc2 inc edx 00007ff9`e389009f 488bce mov rcx,rsi 00007ff9`e38900a2 90 nop 00007ff9`e38900a3 e8c877feff call mscorlib_ni+0x577870 (00007ff9`e3877870) (System.Collections.Generic.List`1[[System.__Canon, mscorlib]].EnsureCapacity(Int32), mdToken: 00000000060039e5) 00007ff9`e38900a8 ebce jmp mscorlib_ni+0x590078 (00007ff9`e3890078) 

Then look back at list < int > and list < double > again. It is not an address from the Entry column. It shows that list < int > and list < double > are two completely different Add methods. If you understand the assembly, you can have a look at it yourself...

MethodDesc Table
           Entry       MethodDesc    JIT Name
00007ff9e38a3650 00007ff9e34dc6e8 PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].Add(Int32)
00007ff9e4428730 00007ff9e34e4170 PreJIT System.Collections.Generic.List`1[[System.Double, mscorlib]].Add(Double) 0:000> !u 00007ff9e38a3650 preJIT generated code System.Collections.Generic.List`1[[System.Int32, mscorlib]].Add(Int32) Begin 00007ff9e38a3650, size 50 >>> 00007ff9`e38a3650 57 push rdi 00007ff9`e38a3651 56 push rsi 00007ff9`e38a3652 4883ec28 sub rsp,28h 00007ff9`e38a3656 488bf1 mov rsi,rcx 00007ff9`e38a3659 8bfa mov edi,edx 00007ff9`e38a365b 8b5618 mov edx,dword ptr [rsi+18h] 00007ff9`e38a365e 488b4e08 mov rcx,qword ptr [rsi+8] 00007ff9`e38a3662 3b5108 cmp edx,dword ptr [rcx+8] 00007ff9`e38a3665 7423 je mscorlib_ni+0x5a368a (00007ff9`e38a368a) 00007ff9`e38a3667 488b5608 mov rdx,qword ptr [rsi+8] 00007ff9`e38a366b 8b4e18 mov ecx,dword ptr [rsi+18h] 00007ff9`e38a366e 8d4101 lea eax,[rcx+1] 00007ff9`e38a3671 894618 mov dword ptr [rsi+18h],eax 00007ff9`e38a3674 3b4a08 cmp ecx,dword ptr [rdx+8] 00007ff9`e38a3677 7321 jae mscorlib_ni+0x5a369a (00007ff9`e38a369a) 00007ff9`e38a3679 4863c9 movsxd rcx,ecx 00007ff9`e38a367c 897c8a10 mov dword ptr [rdx+rcx*4+10h],edi 00007ff9`e38a3680 ff461c inc dword ptr [rsi+1Ch] 00007ff9`e38a3683 4883c428 add rsp,28h 00007ff9`e38a3687 5e pop rsi 00007ff9`e38a3688 5f pop rdi 00007ff9`e38a3689 c3 ret 00007ff9`e38a368a 8b5618 mov edx,dword ptr [rsi+18h] 00007ff9`e38a368d ffc2 inc edx 00007ff9`e38a368f 488bce mov rcx,rsi 00007ff9`e38a3692 90 nop 00007ff9`e38a3693 e8a8e60700 call mscorlib_ni+0x621d40 (00007ff9`e3921d40) (System.Collections.Generic.List`1[[System.Int32, mscorlib]].EnsureCapacity(Int32), mdToken: 00000000060039e5) 00007ff9`e38a3698 ebcd jmp mscorlib_ni+0x5a3667 (00007ff9`e38a3667) 00007ff9`e38a369a e8bf60f9ff call mscorlib_ni+0x53975e (00007ff9`e383975e) (mscorlib_ni) 00007ff9`e38a369f cc int 3 0:000> !u 00007ff9e4428730 preJIT generated code System.Collections.Generic.List`1[[System.Double, mscorlib]].Add(Double) Begin 00007ff9e4428730, size 5a >>> 00007ff9`e4428730 56 push rsi 00007ff9`e4428731 4883ec20 sub rsp,20h 00007ff9`e4428735 488bf1 mov rsi,rcx 00007ff9`e4428738 8b5618 mov edx,dword ptr [rsi+18h] 00007ff9`e442873b 488b4e08 mov rcx,qword ptr [rsi+8] 00007ff9`e442873f 3b5108 cmp edx,dword ptr [rcx+8] 00007ff9`e4428742 7424 je mscorlib_ni+0x1128768 (00007ff9`e4428768) 00007ff9`e4428744 488b5608 mov rdx,qword ptr [rsi+8] 00007ff9`e4428748 8b4e18 mov ecx,dword ptr [rsi+18h] 00007ff9`e442874b 8d4101 lea eax,[rcx+1] 00007ff9`e442874e 894618 mov dword ptr [rsi+18h],eax 00007ff9`e4428751 3b4a08 cmp ecx,dword ptr [rdx+8] 00007ff9`e4428754 732e jae mscorlib_ni+0x1128784 (00007ff9`e4428784) 00007ff9`e4428756 4863c9 movsxd rcx,ecx 00007ff9`e4428759 f20f114cca10 movsd mmword ptr [rdx+rcx*8+10h],xmm1 00007ff9`e442875f ff461c inc dword ptr [rsi+1Ch] 00007ff9`e4428762 4883c420 add rsp,20h 00007ff9`e4428766 5e pop rsi 00007ff9`e4428767 c3 ret 00007ff9`e4428768 f20f114c2438 movsd mmword ptr [rsp+38h],xmm1 00007ff9`e442876e 8b5618 mov edx,dword ptr [rsi+18h] 00007ff9`e4428771 ffc2 inc edx 00007ff9`e4428773 488bce mov rcx,rsi 00007ff9`e4428776 90 nop 00007ff9`e4428777 e854fbffff call mscorlib_ni+0x11282d0 (00007ff9`e44282d0) (System.Collections.Generic.List`1[[System.Double, mscorlib]].EnsureCapacity(Int32), mdToken: 00000000060039e5) 00007ff9`e442877c f20f104c2438 movsd xmm1,mmword ptr [rsp+38h] 00007ff9`e4428782 ebc0 jmp mscorlib_ni+0x1128744 (00007ff9`e4428744) 00007ff9`e4428784 e8d50f41ff call mscorlib_ni+0x53975e (00007ff9`e383975e) (mscorlib_ni) 00007ff9`e4428789 cc int 3 

Maybe you are a little confused. Let me draw a picture.

4: Summary

Generic t is really replaced when JIT is compiled. Four lists < T > will generate four class objects with corresponding specific types, so there is no problem of unboxing and boxing. The type limited visual studio compiler tool helps us to constrain it in advance.


Keywords: Windows Java

Added by arjuna on Sat, 02 May 2020 20:10:09 +0300