C × BCL first level pointer and second level pointer

After years of development framework, do you still know the pointer?

 

1: Background

1. Story telling

More advanced languages are played, and many people may forget about pointer or assembly. This article will talk about pointer with you. Although pointer is not recommended in C, can you say that pointer is not important in C? You need to know that a large number of pointers are used in FCL internal libraries, such as String,Encoding,FileStream and so on. For example, code:


	private unsafe static bool EqualsHelper(string strA, string strB) { fixed (char* ptr = &strA.m_firstChar) { fixed (char* ptr3 = &strB.m_firstChar) { char* ptr2 = ptr; char* ptr4 = ptr3; while (num >= 12) {...} while (num > 0 && *(int*)ptr2 == *(int*)ptr4) {...} } } } public unsafe Mutex(bool initiallyOwned, string name, out bool createdNew, MutexSecurity mutexSecurity) { byte* ptr = stackalloc byte[(int)checked(unchecked((ulong)(uint)securityDescriptorBinaryForm.Length))] } private unsafe int ReadFileNative(SafeFileHandle handle, byte[] bytes, out int hr) { fixed (byte* ptr = bytes) { num = ((!_isAsync) ? Win32Native.ReadFile(handle, ptr + offset, count, out numBytesRead, IntPtr.Zero) : Win32Native.ReadFile(handle, ptr + offset, count, IntPtr.Zero, overlapped)); } } 

Yes, you think that the beautiful world is actually a load of others to help you move forward. To put it another way, the understanding and incomprehension of the pointer can't be ignored in your research on the underlying source code. The pointer is relatively abstract, which tests your ability of spatial imagination. Maybe many existing programmers still don't understand it, because you lack the WYSIWYG tool. I hope this article can help you Take fewer detours.

2: windbg helps you understand

Although the pointer is abstract, if you use windbg to view the memory layout in real time, it will be easy to help you understand the pointer routine. Here are some simple concepts of the pointer.

1. &, * operator

&Address operator, used to get the memory address of a variable, * operator, used to get the value of the storage address in the pointer variable. It's very abstract. Look at windbg.

            unsafe
            {
                int num = 10;
                int* ptr = #
                var num2 = *ptr;
                Console.WriteLine(num2);
            }

0:000> !clrstack -l OS Thread Id: 0x41ec (0) Child SP IP Call Site 0000005b1efff040 00007ffc766208e2 *** WARNING: Unable to verify checksum for ConsoleApp4.exe ConsoleApp4.Program.Main(System.String[]) [C:\dream\Csharp\ConsoleApp1\ConsoleApp4\Program.cs @ 25] LOCALS: 0x0000005b1efff084 = 0x000000000000000a 0x0000005b1efff078 = 0x0000005b1efff084 0x0000005b1efff074 = 0x000000000000000a 

Carefully observe the three key value pairs in LOCALS.

<1> int* ptr = &num; => 0x0000005b1efff078 = 0x0000005b1efff084

int* ptr is called pointer variable. Since it is a variable, it must have its own stack address 0x0000005b1eff078, and the value on this address is 0x0000005b1eff084. This is the stack address of num, hehe.

<2> var num2 = *ptr; => 0x0000005b1efff074 = 0x000000000000000a

*ptr is to use the value [0x0000005b1eff084] of ptr to get the value of this address, so it is 10.

If I don't understand, I'll draw a picture, which is the most important thing~

2. * * operator

**It's also called the second level pointer, which points to the address of the first level pointer variable. It's interesting. The program is as follows: ptr2 points to the address on the PTR stack, one figure is worth a thousand words.


    unsafe
    {
        int num1 = 10;
        int* ptr = &num1;
        int** ptr2 = &ptr;
        var num2 = **ptr2;
    }


0:000> !clrstack -l ConsoleApp4.Program.Main(System.String[]) [C:\dream\Csharp\ConsoleApp1\ConsoleApp4\Program.cs @ 26] LOCALS: 0x000000305f5fef24 = 0x000000000000000a 0x000000305f5fef18 = 0x000000305f5fef24 0x000000305f5fef10 = 0x000000305f5fef18 0x000000305f5fef0c = 0x000000000000000a 

3. + +, -- Operator

This arithmetic operation is often used in array or string equivalent type set, such as the following code:

    fixed (int* ptr = new int[3] { 1, 2, 3 }) { } fixed (char* ptr2 = "abcd") { } 

First, ptr points to the first address of the array allocated on the heap by default, that is, the memory address of 1. After ptr + +, it will enter the memory address of the next shaping element 2, and then the memory address of the next int, that is, 3. It's very simple. Let me give you an example:

        unsafe
        {
            fixed (int* ptr = new int[3] { 1, 2, 3 }) { int* cptr = ptr; Console.WriteLine(((long)cptr++).ToString("x16")); Console.WriteLine(((long)cptr++).ToString("x16")); Console.WriteLine(((long)cptr++).ToString("x16")); } } 0:000> !clrstack -l LOCALS: 0x00000070c15fea50 = 0x000001bcaac82da0 0x00000070c15fea48 = 0x0000000000000000 0x00000070c15fea40 = 0x000001bcaac82dac 0x00000070c15fea38 = 0x000001bcaac82da8 

The values stored in the three memory addresses in Console are 1 and 2, 3 ha, but it should be noted here that C ා is the managed language, and the reference type is allocated in the managed heap, so the address on the heap may change. This is because the GC will reclaim memory regularly, so the vs compiler needs you to fix the memory address on the heap with fixed to escape the pressure of GC. In this case, it is 0x000001bcaac82da0 - (0x000001bcaac82da8 +4).

3: Use two cases to help you understand

The old saying is good, not a word, a thousand words are useless, you have to take some examples to live, well, prepare two examples.

1. Use pointer to replace characters in string

We all know that there is a replace method in string to replace the specified character with the character you want, but the string in C ා is immutable. If you spit on it, it will generate a new string, 🐮👃 It's different to use a pointer. You can find the memory address of the replacement character first, and then assign the new character directly to this memory address. Right? I'll write a piece of code to replace abcgef with abcdef, that is, to replace g with d.

            unsafe
            {
                //Replace 'g' with 'd'
                string s = "abcgef";
                char oldchar = 'g'; char newchar = 'd'; Console.WriteLine($"Before replacement:{s}"); var len = s.Length; fixed (char* ptr = s) { //Current pointer address char* cptr = ptr; for (int i = 0; i < len; i++) { if (*cptr == oldchar) { *cptr = newchar; break; } cptr++; } } Console.WriteLine($"After replacement:{s}"); } ----- output ------ Before replacement:abcgef After replacement:abcdef The execution is over! 

Look at the output. Next, use windbg to find the reference addresses of several string objects on the thread stack. You can grab a dump file at break.

From the address of 10 variables in LOCALS in the figure, the last 9 variables with addresses are all near the first address of string: 0x000001ef1ded2d48, indicating that no new string is generated.

2. Big competition between pointer and index traversal speed

We usually traverse the array through the index. If we do a collision test with the pointer, who do you think is fast? If I say that the index method is the encapsulation of pointer, you should know the answer. Let's watch how fast it is???

In order to make the test results more enjoyable, I am going to traverse 100 million numbers. The environment is: netframework 4.8, release mode


        static void Main(string[] args) { var nums = Enumerable.Range(0, 100000000).ToArray(); for (int i = 0; i < 10; i++) { var watch = Stopwatch.StartNew(); Run1(nums); watch.Stop(); Console.WriteLine(watch.ElapsedMilliseconds); } Console.WriteLine(" -------------- "); for (int i = 0; i < 10; i++) { var watch = Stopwatch.StartNew(); Run2(nums); watch.Stop(); Console.WriteLine(watch.ElapsedMilliseconds); } Console.WriteLine("The execution is over!"); Console.ReadLine(); } //Traversal array public static void Run1(int[] nums) { unsafe { //Address fixed of the last element of the array(int* ptr1 = &nums[nums.Length - 1]) { //Address fixed of the first element of the array(int* ptr2 = nums) { int* sptr = ptr2; int* eptr = ptr1; while (sptr <= eptr) { int num = *sptr; sptr++; } } } } } public static void Run2(int[] nums) { for (int i = 0; i < nums.Length; i++) { int num = nums[i]; } } 

There are pictures and facts. It's nearly twice as fast to go to the pointer directly as to go to the array subscript.

Tags: encoding

Posted on Wed, 20 May 2020 23:05:56 -0700 by mattastic