Go concurrent scheduling advanced GMP initialization is the most difficult thing. Sometimes it's very simple to read it patiently

Go concurrent scheduling advanced - [gongzong No.: stack future]

Please step over here

2. GMP initialization

1. Initialization of M

M has only two states: spin and non spin. When I spin, I will try to find a job; When it cannot be found, it will enter the non spin state, and then it will sleep until it is awakened by other worker threads and enter the spin state again when there is work to be processed.

// src/runtime/proc.go

func mcommoninit(mp *m, id int64) {
 _g_ := getg()
 ...
 lock(&sched.lock)
 ...
 //random initialization, used to steal G
 mp.fastrand[0] = uint32(int64Hash(uint64(mp.id), fastrandseed))
 mp.fastrand[1] = uint32(int64Hash(uint64(cputicks()), ^fastrandseed))
 if mp.fastrand[0]|mp.fastrand[1] == 0 {
  mp.fastrand[1] = 1
 }
  
 //To create a gsignal for signal processing, simply allocate a g structure object from the heap, then set the stack and return
 mpreinit(mp)
 if mp.gsignal != nil {
  mp.gsignal.stackguard1 = mp.gsignal.stack.lo + _StackGuard
 }

 //Hang ^ M ^ into the global linked list allm
 mp.alllink = allm
 ...
}

The id passed in here is - 1. The first call will set the id to 0. There is no initialization related to scheduling for M0. Therefore, it can be simply considered that this function just puts M0 into the global linked list allm and returns. Of course, this M0 is the main M.

2. Initialization of P

  • Usually (the number of P is not adjusted when the program is running), P will only switch under the four states in the above figure. When the program starts running for initialization, all P are in_ The Pgcstop state will be placed in the following state with the initialization of P (runtime. Precisize)_ Pidle.

  • When M needs to run, it will run time Acquire to change P to pruning state and run it through runtime Release to release.

  • When G needs to enter the system call during execution, P will be set to_ Psyscall. If it is robbed by the system monitoring (runtime.retake) at this time, P will be modified to_ Pidle.

  • If a GC occurs while the program is running, P is set to_ Pgcstop, and at runtime Readjust to when starttheworld_ Prunning.

var allp       []*p 

func procresize(nprocs int32) *p {
 //Get the number of previous
 old := gomaxprocs
 //Update statistics
 now := nanotime()
 if sched.procresizetime != 0 {
  sched.totaltime += int64(old) * (now - sched.procresizetime)
 }
 sched.procresizetime = now
 //According to runtime Maxgoprocs , adjusts the number of , p , because , runtime Maxgoprocs # can be set by the user
 if nprocs > int32(len(allp)) { 
  lock(&allpLock)
  if nprocs <= int32(cap(allp)) {
   allp = allp[:nprocs]
  } else {
   nallp := make([]*p, nprocs) 
   copy(nallp, allp[:cap(allp)])
   allp = nallp
  }
  unlock(&allpLock)
 }
 
 //Initialize a new {P
 for i := old; i < nprocs; i++ {
  pp := allp[i]
  //If it is blank, apply for a new {P} object
  if pp == nil {
   pp = new(p)
  }
  pp.init(i)
  atomicstorep(unsafe.Pointer(&allp[i]), unsafe.Pointer(pp))
 }

 _g_ := getg()
 //If P , is not empty and , id , is less than , nprocs, you can continue to use the current , P
 if _g_.m.p != 0 && _g_.m.p.ptr().id < nprocs {
  // continue to use the current P
  _g_.m.p.ptr().status = _Prunning
  _g_.m.p.ptr().mcache.prepareForSweep()
 } else { 
  //Release the current {P because it has expired
  if _g_.m.p != 0 { 
   _g_.m.p.ptr().m = 0
  }
  _g_.m.p = 0
  p := allp[0]
  p.m = 0
  p.status = _Pidle
  //P0 ^ bound to the current ^ M0
  acquirep(p) 
 }
 //Free resources from unused resources
 for i := nprocs; i < old; i++ {
  p := allp[i]
  p.destroy() 
  //You cannot release # p # itself because it may be referenced when # m # enters the system call
 }
 //Reset the length of allp after releasing {P}
 if int32(len(allp)) != nprocs {
  lock(&allpLock)
  allp = allp[:nprocs]
  unlock(&allpLock)
 }
 var runnablePs *p
 //Put the P # without local tasks into the free linked list
 for i := nprocs - 1; i >= 0; i-- {
  p := allp[i]
  //Currently in use {P} skipped
  if _g_.m.p.ptr() == p {
   continue
  }
  //Set status to_ Pidle 
  p.status = _Pidle
  //Whether the task list of P # is empty
  if runqempty(p) {
   //Put in the free list
   pidleput(p)
  } else {
   //Get free ^ M ^ bind to ^ P ^
   p.m.set(mget())
            // 
   p.link.set(runnablePs)
   runnablePs = p
  }
 }
 stealOrder.reset(uint32(nprocs))
 var int32p *int32 = &gomaxprocs // make compiler check that gomaxprocs is an int32
 atomic.Store((*uint32)(unsafe.Pointer(int32p)), uint32(nprocs))
 return runnablePs
}

The execution process of the precisize method is as follows:

  1. Allp is the resource pool of global variable P. if the number of processors in the slice of allp is less than the expected number, the slice will be expanded;

  2. During capacity expansion, new will be used to apply for a new P, and then init will be used for initialization. It should be noted that the id of the initialized P is the value of i passed in, and the status is_ Pgcstop;

  3. Then obtain M0 through g.m.p. if M0 has been bound to a valid P, modify the state of the bound p to_ Prunning. Otherwise, get allp[0] as P0 and call runtime Acquirep is bound with M0;

  4. P that exceeds the number of processors releases resources through p.destroy. P.destroy releases resources related to P and sets the P status to_ Pdead;

  5. The length of the global variable allp is changed by truncation to ensure that it is equal to the expected number of processors;

  6. Traverse allp to check whether P is idle. If yes, put it into the idle list;

3. Initialization of G

This is the state flow diagram of g. the initialization of G is quite complex. You need to go down and look at the source code again.

func newproc(siz int32, fn *funcval) {
 //The first parameter address is obtained by increasing the length of a pointer from the address of fn
 argp := add(unsafe.Pointer(&fn), sys.PtrSize)
 gp := getg()
 pc := getcallerpc() //Gets the caller's PC/IP} register value
  
 //Create a # Goroutine # object with # g0 # system stack
 //The parameters passed include: fn function entry address, argp parameter start address, siz parameter length, gp (g0), caller {pc (goroutine)
 systemstack(func() {
  newg := newproc1(fn, argp, siz, gp, pc)

  _p_ := getg().m.p.ptr() //Get p
  runqput(_p_, newg, true) //And put it into the queue header or global queue of the {P} local queue

//Check the idle P, wake it up and prepare to execute G, but we are currently in the initialization stage and the main Goroutine has not started to execute, so P will not be awakened here.
  if mainStarted {
   wakep()
  }
 })
}

//Create a new g running fn with a parameter of narg byte size, starting with argp.
// callerps is the starting address of the go statement. The newly created g will be put into the queue of G and wait to run.
func newproc1(fn *funcval, argp unsafe.Pointer, narg int32, callergp *g, callerpc uintptr) *g {
 _g_ := getg()//Because it is running on the system stack, the ¢ g ¢ at this time is ¢ g0
 acquirem() //No preemption
 if fn == nil {
  _g_.m.throwing = -1 // do not dump full stacks
  throw("go of nil func value")
 }

 ...
 siz := narg
 siz = (siz + 7) &^ 7

 
 ...

//The {p to which the current worker is bound
//During initialization_ p_  = g0.m.p, that is_ p_  = allp[0]
 _p_ := _g_.m.p.ptr()
//Get an unused , g from the local buffer of , p , which is null during initialization and returns , nil
 newg := gfget(_p_)
 if newg == nil {
  //Create an owner_ StackMin = g of stack size
  newg = malg(_StackMin)
  //Remove the newly created {g} from_ Gidle update to_ Gdead status
  casgstatus(newg, _Gidle, _Gdead)
  //Add # g # in the # Gdead # state to # allg so that # GC # does not scan uninitialized stacks
  allgadd(newg) 
 }
 if newg.stack.hi == 0 {
  throw("newproc1: newg missing stack")
 }

 if readgstatus(newg) != _Gdead {
  throw("newproc1: new g is not Gdead")
 }

 ...
 //Determine sp location
 sp := newg.stack.hi - totalSize
 //Determine the parameter stack position
 spArg := sp
  
 ...
 
 if narg > 0 {
  //Copy the parameters from the stack executing the # newproc # function to the stack of the new # g #
  memmove(unsafe.Pointer(spArg), argp, uintptr(narg))
  ...
 }

 //Set the scheduling related information of newg
 memclrNoHeapPointers(unsafe.Pointer(&newg.sched), unsafe.Sizeof(newg.sched))
 newg.sched.sp = sp
 newg.stktopsp = sp
 newg.sched.pc = funcPC(goexit) + sys.PCQuantum 
 newg.sched.g = guintptr(unsafe.Pointer(newg))
 gostartcallfn(&newg.sched, fn)
 newg.gopc = callerpc
 newg.ancestors = saveAncestors(callergp)
 newg.startpc = fn.fn
 if _g_.m.curg != nil {
  newg.labels = _g_.m.curg.labels
 }
 if isSystemGoroutine(newg, false) {
  atomic.Xadd(&sched.ngsys, +1)
 }
  
 //Set the status of # g # to_ Grunnable, ready to run
 casgstatus(newg, _Gdead, _Grunnable)

 //Set goid
 newg.goid = int64(_p_.goidcache)
 _p_.goidcache++
  
 ...
 releasem(_g_.m) //Resuming preemption is essentially locking
 return newg
}

The process of creating G is also relatively complex. Let's summarize this process:

  1. First, try to get the executed g from the P local gfree linked list or the global gfree queue

  2. During initialization, it is impossible for the program to obtain g, whether it is a local queue or a global queue. Therefore, a new G is created and its running thread (execution stack) is allocated. At this time, G is in_ Gidle status

  3. After creation, g is changed to_ Gdead state, initialize the SP of the execution stack and the stack position of the parameters according to the entry address and parameters of the function to be executed, and store a copy of the required parameters in the execution stack

  4. According to SP and parameters, save SP and PC pointers in g.sched to initialize the operation site of G

  5. Save the caller and the entry PC of the function to be executed, and change the state of g to_ Grunnable

  6. Assign an id to Goroutine and put it into the queue header or global queue of P local queue (the queue is certainly not full at the initialization stage, so it is impossible to put it into the global queue)

  7. Check the idle P, wake it up and prepare to execute G, but we are currently in the initialization stage and the main Goroutine has not started to execute, so P will not be awakened here.

4. Summary

Combined with the GMP structure of the previous article and the initialization of this article, I believe I will have a certain understanding of GMP scheduling. At least you know some bits and pieces of the bottom layer. The follow-up articles will continue to explain GMP scheduling. You insist on learning until you finally find that the most difficult GMP is that!

Gongzong No.: future

So that many coder s in the confused stage can find light from here, stack creation, contribute to the present and benefit the future

Added by blacklotus on Fri, 07 Jan 2022 10:28:55 +0200