1: Background
1. Tell a story
The day before yesterday, a friend added wechat to ask for help. There was a problem of high CPU in his program. It was a red envelope at the beginning, which scared me! 🤣🤣🤣
As it was a young year in the south, it was inconvenient for me to deal with it in my hometown, so I didn't help him deal with it at the first time. My friend found the problem the next morning and fed back that it was caused by a while(true), which was a little interesting. In many cases of CPU explosion I analyzed, I didn't encounter a while(true), so I always had regrets. It's really lucky, I caught up years ago, ha ha 😄😄😄.
Next, let's use windbg to analyze it.
2: Windbg analysis
1. Check CPU usage
My friends who have been paying attention to me all know, use! tp command is OK.
0:022> !tp CPU utilization: 95 Unknown format characterUnknown format control characterWorker Thread: Total: 11 Running: 11 Idle: 0 MaxLimit: 32767 MinLimit: 4 Work Request in Queue: 0 -------------------------------------- Number of Timers: 3 -------------------------------------- Completion Port Thread:Total: 4 Free: 4 MaxFree: 8 CurrentLimit: 4 MaxLimit: 1000 MinLimit: 4
The above sentence "Unknown format characterUnknown format control characterWorker" seems discordant. In fact, it means% and I don't know why NETCore has this coding problem 😂😂😂, Next, let's use it! eeversion take a look.
0:022> !eeversion 4.700.21.56803 (3.x runtime) free 4,700,21,56803 @Commit: 28bb6f994c28bc91f09bc0ddb5dcb51d0f066806 Workstation mode In plan phase of garbage collection SOS Version: 5.0.4.36902 retail build
From the basic information, the current situation is netcore 3.x version, and it is obvious that the current GC is in the planning stage. What is the planning stage?
2. What is the planning stage
In short, the GC in the planning stage needs to decide whether the current managed heap is to do a simple mark free operation or a heavyweight compression operation. If the compression processing is to be done, it also needs to involve the relocation of managed heap objects, which often consumes a considerable amount of CPU time. Next, what causes the GC trigger?
3. GC trigger reason
Because the triggering of GC is often caused by the data allocated by the user thread, one link in the whole execution flow triggered by GC is to freeze the CLR execution engine, that is, suspend EE, which can be used in GC Find out in the app.
Why do you have to mention SuspendEE? Because I can pass! t -special find the thread of SuspendEE, so that the accuracy is higher.
0:072> !t -special ThreadCount: 54 UnstartedThread: 0 BackgroundThread: 40 PendingThread: 0 DeadThread: 1 Hosted Runtime: no OSID Special thread type 1 6328 DbgHelper 2 35c0 Finalizer 4 5aac Timer 5 38b0 ThreadpoolWorker 17 3530 ThreadpoolWorker 18 4484 ThreadpoolWorker 19 1e4c ThreadpoolWorker 21 6380 ThreadpoolWorker 44 5bc4 SuspendEE 52 8ac ThreadpoolWorker 54 4164 ThreadpoolWorker 56 61c8 ThreadpoolWorker 58 1fa4 ThreadpoolWorker 60 2788 ThreadpoolWorker 69 48f4 IOCompletion 70 5708 IOCompletion 71 3b58 ThreadpoolWorker 72 17a0 GC 73 2f00 Gate 74 35e8 IOCompletion 75 5730 IOCompletion
You can see that the current thread 44 is the thread that triggers GC. Then it becomes clear. Let's see what thread 44 is doing? Switch to thread 44, and then! Just clrstack.
0:044> !clrstack OS Thread Id: 0x5bc4 (44) Child SP IP Call Site 000000A2B0C3E4C8 00007ffd471ead3a [HelperMethodFrame: 000000a2b0c3e4c8] 000000A2B0C3E5E0 00007ffce8b4e506 System.Collections.Generic.List`1[[System.__Canon, System.Private.CoreLib]].System.Collections.Generic.IEnumerable.GetEnumerator() [/_/src/System.Private.CoreLib/shared/System/Collections/Generic/List.cs @ 585] 000000A2B0C3E630 00007ffce85e7a10 xxx.Program.DeletexxxExipredDate() 000000A2B0C3E780 00007ffd46bc1f0b System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44] 000000A2B0C3E7B0 00007ffd46bb90b6 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 172] 000000A2B0C3E830 00007ffd46ba535b System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93] 000000A2B0C3EA48 00007ffd47236c93 [GCFrame: 000000a2b0c3ea48] 000000A2B0C3ECB0 00007ffd47236c93 [DebuggerU2MCatchHandlerFrame: 000000a2b0c3ecb0]
From the output information, the problem lies in the deletexxexipredate () method. Next, explore the source code of this method.
private static void DeletexxxExipredDate() { while (true) { foreach (string key in xxx.xxxSpeedLimit.Keys) { try { string[] array = xxx.xxxSpeedLimit[key].Split('$'); if (array.Length > 1) { DateTime dateTime = Convert.ToDateTime(array[1]); if ((DateTime.Now - dateTime).TotalSeconds > 21600.0 && xxx.xxxSpeedLimit.ContainsKey(key)) { xxx.xxxSpeedLimit.TryRemove(key, out var _); } } } catch (Exception ex) { LogHelper.WriteAppExceptionLog("Exception in deleting data:" + ex.Message, ex); } Thread.Sleep(20000); } } }
If you have a friend with rich experience in stepping on pits, I believe you can see the problems in the code at a glance. Yes, when the xxxSpeedLimit dictionary is empty, it is equivalent to a while(true) loop, isn't it?
In order to verify my statement, you can use it! dso finds the memory address of dict and uses it! wconcurrentdict is enough.
0:044> !dso OS Thread Id: 0x5bc4 (44) RSP/REG Object Name ... 000000A2B0C3E708 000001ba8007f618 System.Collections.Concurrent.ConcurrentDictionary`2[[System.String, System.Private.CoreLib],[System.String, System.Private.CoreLib]] 000000A2B0C3E760 000001ba88501cd0 System.Collections.Generic.List`1+Enumerator[[System.String, System.Private.CoreLib]] 000000A2B0C3E768 000001ba80050ec0 System.Threading.ContextCallback 000000A2B0C3E7F8 000001ba80a1a818 System.Threading.Thread 000000A2B0C3EA28 000001ba80a1a898 System.Threading.ThreadStart 0:044> !wconcurrentdict 000001ba8007f618 Empty ConcurrentDictionary
You can see that the current dictionary is empty 😂😂😂
3: Summary
The main reason for this accident: the coder lacks some programming experience and the process processing of empty dictionary when writing business logic, which leads to the embarrassment of while(true), or the thread Sleep (20000) is misplaced 🤣🤣🤣
In general, I am very grateful for the dump provided by this friend, which makes me really believe what I see!