Looking at Scheduling Tracking with GODEBUG

One of the reasons why Go is more powerful is its GODEBUG tool. The settings of GODEBUG allow the Go program to output debugging information at run time. You can intuitively see the detailed information of the scheduler or garbage collection you want according to your requirements, and you don't need to install other plug-ins. It's very convenient. Today we have I will first explain the GODEBUG scheduler related content, hope to help you.

However, before starting, the small partners who have not contacted must supplement the following pre-knowledge, so as to better understand the information content of the debugger output.

Original address: Looking at Scheduling Tracking with GODEBUG

Pre-knowledge

The main function of Go scheduler is to distribute runnable Goroutine for OS threads running on processors. When we talk about schedulers, we can't do without three abbreviations that are often mentioned, namely:

G: Goroutine. In fact, every time we call go func, we generate a G.
P: Processors, generally the number of cores of processors, can be modified by GOMAXPROCS.
M: OS threads

These three interactions actually come from Go M: N scheduling model, that is, M must be bound to P, and then continue to search for runnable G on M to perform the corresponding tasks, if you want to know more about it, you can read them in detail. <Go Runtime Scheduler> We extract the workflow diagram for a simple analysis, as follows:

When we execute go func(), we actually create a brand new Goroutine, which we call G.
The newly created G will be placed in P's Local Queue or Global Queue to prepare for the next step.
Wake up or create M to execute G.
Continuous cycle of events
Find G in Available State for Task Execution
After clearance, re-enter the event cycle

In the description, both global and local queues are mentioned. In fact, they are used to store G waiting for running. But the difference is that the number of local queues is limited and no more than 256 queues are allowed. When a new G is created, the local queue of P will be preferentially selected. If the local queue is full, half of the G of P's local queue will be moved to the global queue, which can be understood as the sharing and rebalancing of scheduling resources.

In addition, we can see that there is steal behavior on the graph. What is it used for? We all know that when you create a new G or G into a runnable state, it will be pushed into the local queue of the current P. But when P finishes executing G, it will also "work" and pop up G from the local queue. At the same time, it will check whether the current local queue is empty. If it is empty, it will randomly try to steal half of the runnable G from the local queue of other P to its own name. Examples are as follows:

In this example, P2 can't find a running G in the local queue. It will execute the work-stealing scheduling algorithm, randomly select other processor P1, and steal three G from the local queue of P1 to its own local queue. So far, P1 and P2 have runnable G, and the redundant G of P1 will not be wasted. Scheduling resources will flow more evenly among multiple processors.

GODEBUG

GODEBUG variables can control debugging variables during runtime. The parameters are separated by commas in the form of name=val. This paper focuses on the scheduler observation, which will use the following two parameters:

Schedtrace: Setting the schedtrace=X parameter allows the runtime to send a line of scheduler summary information to the standard err output every X millisecond.
Scheddetail: Setting schedtrace=X and scheddetail=1 allows the runtime to send a detailed multi-line message every X millisecond, which mainly includes the status of the scheduler, processor, OS thread and Goroutine.

Demo code

func main() {
    wg := sync.WaitGroup{}
    wg.Add(10)
    for i := 0; i < 10; i++ {
        go func(wg *sync.WaitGroup) {
            var counter int
            for i := 0; i < 1e10; i++ {
                counter++
            }
            wg.Done()
        }(&wg)
    }

    wg.Wait()
}

schedtrace

$ GODEBUG=schedtrace=1000 ./awesomeProject 
SCHED 0ms: gomaxprocs=4 idleprocs=1 threads=5 spinningthreads=1 idlethreads=0 runqueue=0 [0 0 0 0]
SCHED 1000ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=0 [1 2 2 1]
SCHED 2000ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=0 [1 2 2 1]
SCHED 3001ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=0 [1 2 2 1]
SCHED 4010ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=0 [1 2 2 1]
SCHED 5011ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=0 [1 2 2 1]
SCHED 6012ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=0 [1 2 2 1]
SCHED 7021ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=4 [0 1 1 0]
SCHED 8023ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=4 [0 1 1 0]
SCHED 9031ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=4 [0 1 1 0]
SCHED 10033ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=4 [0 1 1 0]
SCHED 11038ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=4 [0 1 1 0]
SCHED 12044ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=4 [0 1 1 0]
SCHED 13051ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=4 [0 1 1 0]
SCHED 14052ms: gomaxprocs=4 idleprocs=2 threads=5 
...

sched: Each line represents the debugging information of the scheduler, and the number of milliseconds prompted later indicates the running time from start to now. The output interval is affected by the value of schedtrace.
GOMAXPROCS: Current CPU core number (current value of GOMAXPROCS).
idleprocs: Number of idle processors, followed by a number indicating the current number of idle processors.
Threads: Number of OS threads, followed by the number of threads currently running.
Spinning threads: Number of OS threads in spinning state.
idlethreads: Number of idle threads.
runqueue: The number of Goroutines in the global queue, and the following [001] represents the number of Goroutines running in the local queues of the four P's, respectively.

We mentioned the concept of "spin threads" above. If you haven't known the concept before, it would be confusing to listen to "spin". We quote the content of "Head First of Golang Scheduler" to illustrate that:

The reason for spinning threads is that the designers of Go Scheduler put forward the concept of "Spinning Thread" after considering "the resource utilization of OS" and "the load of frequent thread preemption on OS". That is, when the "spin thread" does not find a Goroutine that can be scheduled for execution, it does not destroy the thread, but takes the "spin" operation to save it. Although this seems to be a waste of resources, considering the syscall scenario, we can see that frequent preemption between threads and frequent creation and destruction operations can do more harm than "spin".

scheddetail

If we want to see the complete information of the scheduler in more detail, we can add the scheddetail parameter, and we can see the detailed logic of the scheduling further, as follows:

$ GODEBUG=scheddetail=1,schedtrace=1000 ./awesomeProject
SCHED 1000ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=0 idlethreads=0 runqueue=0 gcwaiting=0 nmidlelocked=0 stopwait=0 sysmonwait=0
  P0: status=1 schedtick=2 syscalltick=0 m=3 runqsize=3 gfreecnt=0
  P1: status=1 schedtick=2 syscalltick=0 m=4 runqsize=1 gfreecnt=0
  P2: status=1 schedtick=2 syscalltick=0 m=0 runqsize=1 gfreecnt=0
  P3: status=1 schedtick=1 syscalltick=0 m=2 runqsize=1 gfreecnt=0
  M4: p=1 curg=18 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=false lockedg=-1
  M3: p=0 curg=22 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=false lockedg=-1
  M2: p=3 curg=24 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=false lockedg=-1
  M1: p=-1 curg=-1 mallocing=0 throwing=0 preemptoff= locks=1 dying=0 spinning=false blocked=false lockedg=-1
  M0: p=2 curg=26 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 spinning=false blocked=false lockedg=-1
  G1: status=4(semacquire) m=-1 lockedm=-1
  G2: status=4(force gc (idle)) m=-1 lockedm=-1
  G3: status=4(GC sweep wait) m=-1 lockedm=-1
  G17: status=1() m=-1 lockedm=-1
  G18: status=2() m=4 lockedm=-1
  G19: status=1() m=-1 lockedm=-1
  G20: status=1() m=-1 lockedm=-1
  G21: status=1() m=-1 lockedm=-1
  G22: status=2() m=3 lockedm=-1
  G23: status=1() m=-1 lockedm=-1
  G24: status=2() m=2 lockedm=-1
  G25: status=1() m=-1 lockedm=-1
  G26: status=2() m=0 lockedm=-1

Here we extract 1000ms debugging information to view, the amount of information is relatively large, we start with each field to understand. As follows:

G

status: The running state of G.
M: Which M belongs to?
lockedm: Is there a lock M?

In the first point, we mentioned the operation state of G, which is very useful for the analysis of internal flow, involving the following nine states:

state	value	Meaning
_Gidle	0	It has just been assigned and has not been initialized yet.
_Grunnable	1	User code has not yet been executed in the running queue.
_Grunning	2	Instead of running in the queue, user code can already be executed, and M and P have been allocated.
_Gsyscall	3	System calls are being executed, and M is allocated.
_Gwaiting	4	It is blocked at run time, does not execute user code, and is not in the run queue, while it is blocking somewhere waiting.
_Gmoribund_unused	5	Not yet used, but hard-coded in gdb.
_Gdead	6	Not yet in use, this state may be just exited or initialized, at which time it does not execute user code, may or may not have allocated stacks.
_Genqueue_unused	7	Not yet in use.
_Gcopystack	8	The stack is being replicated, and user code is not executed, nor is it in the running queue.

After understanding the meaning of various states, we take a look at the above cases as follows:

G1: status=4(semacquire) m=-1 lockedm=-1
G2: status=4(force gc (idle)) m=-1 lockedm=-1
G3: status=4(GC sweep wait) m=-1 lockedm=-1
G17: status=1() m=-1 lockedm=-1
G18: status=2() m=4 lockedm=-1

In this fragment, G1 runs as _Gwait, without allocating M and locking. At this point, you may be curious about what is in parentheses in the fragment. In fact, the status=4 means that Goroutine is blocked at runtime, and the event that stops it is the semacquire event, because semacquire checks the semaphore and calls the goparkunlock function at the right time. The former Goroutine is placed in the waiting queue and set to the _Gwaiting state.

Then, in the actual operation, what causes this phenomenon? Let's look at it together, as follows:

    waitReasonZero                                    // ""
    waitReasonGCAssistMarking                         // "GC assist marking"
    waitReasonIOWait                                  // "IO wait"
    waitReasonChanReceiveNilChan                      // "chan receive (nil chan)"
    waitReasonChanSendNilChan                         // "chan send (nil chan)"
    waitReasonDumpingHeap                             // "dumping heap"
    waitReasonGarbageCollection                       // "garbage collection"
    waitReasonGarbageCollectionScan                   // "garbage collection scan"
    waitReasonPanicWait                               // "panicwait"
    waitReasonSelect                                  // "select"
    waitReasonSelectNoCases                           // "select (no cases)"
    waitReasonGCAssistWait                            // "GC assist wait"
    waitReasonGCSweepWait                             // "GC sweep wait"
    waitReasonChanReceive                             // "chan receive"
    waitReasonChanSend                                // "chan send"
    waitReasonFinalizerWait                           // "finalizer wait"
    waitReasonForceGGIdle                             // "force gc (idle)"
    waitReasonSemacquire                              // "semacquire"
    waitReasonSleep                                   // "sleep"
    waitReasonSyncCondWait                            // "sync.Cond.Wait"
    waitReasonTimerGoroutineIdle                      // "timer goroutine (idle)"
    waitReasonTraceReaderBlocked                      // "trace reader (blocked)"
    waitReasonWaitForGCCycle                          // "wait for GC cycle"
    waitReasonGCWorkerIdle                            // "GC worker (idle)"

From waitReason above, we can see why Goroutine will be suspended, that is, events in parentheses.

M

P: Which P belongs to?
curg: Which G is currently in use?
runqsize: The number of G in the running queue.
gfreecnt: Available G (in Gdead state).
mallocing: Is memory being allocated?
Throwing: throwing an exception.
preemptoff: If it's not an empty string, keep curg running on this m.

P

status: The running state of P.
Schedule tick: The number of schedules in P.
syscalltick: The number of system calls in P.
M: Which M belongs to?
runqsize: The number of G in the running queue.
gfreecnt: Available G (in Gdead state).

state	value	Meaning
_Pidle	0	It has just been assigned and has not been initialized yet.
_Prunning	1	When M and P bind to call acquirep, the state of P changes to _Prunning.
_Psyscall	2	System calls are being executed.
_Pgcstop	3	Suspension of operation, at this time the system is in the process of GC, until the end of GC will not be transferred to the next state stage.
_Pdead	4	Abandoned, no longer used.

summary

Through this article we learned some basic knowledge of scheduling, and then through the magic GODEBUG mastered the way to observe the scheduler, you want to think, can I go tool trace and my last article to use it together, in actual use, there are many similar methods, the combination of skillful use is the focus.

Reference resources

Keywords: PHP Fragment

Added by techevan on Mon, 19 Aug 2019 15:13:16 +0300

Programming VIP