Original: Go Source: Coprocess Stack
Tips
- Go version 1.12
- Inspired by Contiguous stacks
- When it comes to implementation details, you need to Stack Frame And pointer operation basis.
Preface
Stack management use of go prior to version 1.4 Segmented stack Mechanism implementation.Implementation: When a function is detected to require more stacks, a new stack is allocated, and the old stack and the new stack are connected using a pointer, and the function is released upon return.There are two problems with such a mechanism:
- There is a "hot split" problem when calling the same function multiple times in a loop, for example: stacksplit.go
- Additional consumption per allocation and release
To solve these two problems, the official use is continuous stacks.Continuous stack implementation: When more stacks are detected, a stack twice as large as the original is allocated, the old stack data is copied to the new stack, and the old stack is released.
Contiguous Stack
The stack has a large amount of code to expand and shrink, so it has been greatly streamlined.Before looking at the source code for a continuous stack, consider the following questions:
- What trigger conditions for expansion and shrinking?
- How can the size of expansion and shrinkage be calculated?
- What does this process do?Does it affect performance?
Stack expansion
func newstack() { thisg := getg() ...... gp := thisg.m.curg ...... // Allocate a bigger segment and move the stack. oldsize := gp.stack.hi - gp.stack.lo newsize := oldsize * 2 // Twice as big as it used to be ...... // The goroutine must be executing in order to call newstack, // so it must be Grunning (or Gscanrunning). casgstatus(gp, _Grunning, _Gcopystack) //Modify protocol state // The concurrent GC will not scan the stack while we are doing the copy since // the gp is in a Gcopystack status. copystack(gp, newsize, true) //As we will see below ...... casgstatus(gp, _Gcopystack, _Grunning) gogo(&gp.sched) }
Each function takes up stack space to hold variables, parameters, and so on.A function running in a coprocess naturally occupies the stack of coprocesses running it.However, the stack of a protocol is limited, and if it finds that it is not enough, it calls stackalloc to allocate a new stack twice as large.
Stack Compression
func shrinkstack(gp *g) { gstatus := readgstatus(gp) ...... oldsize := gp.stack.hi - gp.stack.lo newsize := oldsize / 2 // Twice smaller than before // Don't shrink the allocation below the minimum-sized stack // allocation. if newsize < _FixedStack { return } // Compute how much of the stack is currently in use and only // shrink the stack if gp is using less than a quarter of its // current stack. The currently used stack includes everything // down to the SP plus the stack guard space that ensures // there's room for nosplit functions. avail := gp.stack.hi - gp.stack.lo //Shrink when used stacks account for less than a quarter of the total stack if used := gp.stack.hi - gp.sched.sp + _StackLimit; used >= avail/4 { return } copystack(gp, newsize, false) //As we will see below }
Stack shrinking occurs mainly during GC.A protocol becomes resident and takes up a lot of memory when it is busy, but less when it is idle, which wastes a lot of memory. To avoid wasting Go shrinking the stack of the protocol when it is in GC, the shrinking also allocates a new memory to replace the original, which is only 1/2 the size.
What does this process do?
func copystack(gp *g, newsize uintptr, sync bool) { ...... old := gp.stack ...... used := old.hi - gp.sched.sp // allocate new stack new := stackalloc(uint32(newsize)) ...... // Compute adjustment. var adjinfo adjustinfo adjinfo.old = old adjinfo.delta = new.hi - old.hi //Adjustment for old stack pointer //Later opportunity to analyze with select / chan // Adjust sudogs, synchronizing with channel ops if necessary. ncopy := used if sync { adjustsudogs(gp, &adjinfo) } else { ...... adjinfo.sghi = findsghi(gp, old) // Synchronize with channel ops and copy the part of // the stack they may interact with. ncopy -= syncadjustsudogs(gp, used, &adjinfo) } //Copy old stack data to new stack // Copy the stack (or the rest of it) to the new location memmove(unsafe.Pointer(new.hi-ncopy), unsafe.Pointer(old.hi-ncopy), ncopy) // Adjust remaining structures that have pointers into stacks. // We have to do most of these before we traceback the new // stack because gentraceback uses them. adjustctxt(gp, &adjinfo) adjustdefers(gp, &adjinfo) adjustpanics(gp, &adjinfo) ...... // Swap out old stack for new one gp.stack = new gp.stackguard0 = new.lo + _StackGuard // NOTE: might clobber a preempt request gp.sched.sp = new.hi - used gp.stktopsp += adjinfo.delta // Adjust pointers in the new stack. gentraceback(^uintptr(0), ^uintptr(0), 0, gp, 0, nil, 0x7fffffff, adjustframe, noescape(unsafe.Pointer(&adjinfo)), 0) ...... //Release Old Stack stackfree(old) }
Many adjustments have been made during the process of expanding and shrinking.From the implementation of continuous stacks, we know that whether it's expansion or contraction, we reapply for a new stack, and then copy the data from the old stack to the new stack.The physical memory used by the collaboration has been completely replaced, and Go s will save the pointer in memory when running, for example, gp.sched.ctxt, gp._defer, gp._panic, including the pointer in the function.This part of the pointer value is converted to an integer uintptr and then adjusted with + delta.
func adjustpointer(adjinfo *adjustinfo, vpp unsafe.Pointer) { pp := (*uintptr)(vpp) p := *pp ...... //If this integer number is in the range of the old stack, adjust if adjinfo.old.lo <= p && p < adjinfo.old.hi { *pp = p + adjinfo.delta ...... } }
Frame Adjustment
If you just want to know the scaling of the stack, that's enough.This part goes into the details and is not interesting to skip.Before you learn about Frame adjustments, look at Stack Frame .Stack Frame: The memory space occupied by a function when it runs is a collection of data on the stack that includes:
- Local variables
- Saved copies of registers modified by subprograms that could need restoration
- Argument parameters
- Return address
FP,SP,PC ,LR
- FP: Frame Pointer
– Points to the bottom of the argument list
- SP: Stack Pointer
– Points to the top of the space allocated for local variables
- PC: Program Counter
- LR: Caller's Program Counter
Stack frame layout
// (x86) // +------------------+ // | args from caller | // +------------------+ <- frame->argp // | return address | // +------------------+ // | caller's BP (*) | (*) if framepointer_enabled && varp < sp // +------------------+ <- frame->varp // | locals | // +------------------+ // | args to callee | // +------------------+ <- frame->sp
For X86 and ARM in Go Stack frame layout It will be different, here only X86 is analyzed.
To visualize the results of the Frame adjustment, let's look at the following examples:
func bb(a *int, aa *int) { var v1 int println("v1 before morestack", uintptr(unsafe.Pointer(&v1))) cc(0) println("a after morestack", uintptr(unsafe.Pointer(a))) println("aa after morestack", uintptr(unsafe.Pointer(aa))) println("v1 after morestack", uintptr(unsafe.Pointer(&v1))) } // for morestack func cc(i int){ i++ if i >= 30 { println("morestack done") }else{ cc(i) } } func main() { wg := sync.WaitGroup{} wg.Add(1) go func() { var a, aa int a = 1000 aa = 1000 println("a before morestack", uintptr(unsafe.Pointer(&a))) println("aa before morestack", uintptr(unsafe.Pointer(&aa))) bb(&a, &aa) wg.Done() }() wg.Wait() }
Result:
a before morestack 824633925560 aa before morestack 824633925552 v1 before morestack 824633925504 morestack done a after morestack 824634142648 aa after morestack 824634142640 v1 after morestack 824634142592
From the results, we can see that bb's parameters a, a A and variable v1 address have changed after expansion. How does this change happen?We will focus on the following three issues:
- How to confirm the location of the function Frame
- How to find function parameters, pointers to variables
- How to confirm the Frame of a parent function
from gentraceback start
func gentraceback(pc0, sp0, lr0 uintptr, gp *g, skip int, pcbuf *uintptr, max int, callback func(*stkframe, unsafe.Pointer) bool, v unsafe.Pointer, flags uint) int { ...... g := getg() ...... if pc0 == ^uintptr(0) && sp0 == ^uintptr(0) { // Signal to fetch saved values from gp. if gp.syscallsp != 0 { ...... } else { //Running Location pc0 = gp.sched.pc sp0 = gp.sched.sp ...... } } nprint := 0 var frame stkframe frame.pc = pc0 frame.sp = sp0 ...... f := findfunc(frame.pc) ...... frame.fn = f n := 0 for n < max { ...... f = frame.fn if f.pcsp == 0 { // No frame information, must be external function, like race support. // See golang.org/issue/13568. break } ...... if frame.fp == 0 { sp := frame.sp ...... //Calculate FP frame.fp = sp + uintptr(funcspdelta(f, frame.pc, &cache)) if !usesLR { // On x86, call instruction pushes return PC before entering new function. frame.fp += sys.RegSize } } var flr funcInfo if topofstack(f, gp.m != nil && gp == gp.m.g0) { ...... } else if usesLR && f.funcID == funcID_jmpdefer { ...... } else { var lrPtr uintptr if usesLR { ...... } else { if frame.lr == 0 { //Gets the PC value of the calling function lrPtr = frame.fp - sys.RegSize frame.lr = uintptr(*(*sys.Uintreg)(unsafe.Pointer(lrPtr))) } } flr = findfunc(frame.lr) ...... } frame.varp = frame.fp if !usesLR { // On x86, call instruction pushes return PC before entering new function. frame.varp -= sys.RegSize } ...... if framepointer_enabled && GOARCH == "amd64" && frame.varp > frame.sp { frame.varp -= sys.RegSize } ...... if callback != nil || printing { frame.argp = frame.fp + sys.MinFrameSize ...... } ...... //Currently adjusting frame if callback != nil { if !callback((*stkframe)(noescape(unsafe.Pointer(&frame))), v) { return n } } ...... n++ skipped: ...... //Confirm Parent Frame // Unwind to next frame. frame.fn = flr frame.pc = frame.lr frame.lr = 0 frame.sp = frame.fp frame.fp = 0 frame.argmap = nil ...... } ...... return n }
gentraceback There's a lot of code, so here's a simplification of adjusting the parameters passed in to the Frame and what we're going to explore.It's still a long time after simplification. Don't worry, we'll strip out this function one by one.
-
Confirm Current Location
Go runtime has saved the PC to gp.sched.pc and SP to gp.sched.sp when scaling occurred.
-
Find Function Information
Function parameters, number of variables, frame size, file line and other information, compiled and saved into the execution file, when executed, loaded into memory, this part of the data can be obtained by PC: findfunc -> findmoduledatap
func findmoduledatap(pc uintptr) *moduledata { for datap := &firstmoduledata; datap != nil; datap = datap.next { if datap.minpc <= pc && pc < datap.maxpc { return datap } } return nil }
- Calculate FP
frame.fp = sp + uintptr(funcspdelta(f, frame.pc, &cache))
SP can be understood as the top of the function, FP is the bottom of the function, with SP, there is no function length (frame size).Actually we can get it from pcsp because it has been mapped into memory. See Go 1.2 Runtime Symbol Information .Knowing FP and SP, we know where the function is on the cockpit.
-
Get parent function PC directive (LR)
lrPtr = frame.fp - sys.RegSize frame.lr = uintptr(*(*sys.Uintreg)(unsafe.Pointer(lrPtr)))
The PC instruction of the parent function is placed in the return address position of the stack frame graph, and we can take it out directly, according to this instruction we get the information of the parent function.
- Confirm Parent Function Frame
frame.fn = flr frame.pc = frame.lr frame.lr = 0 frame.sp = frame.fp frame.fp = 0 frame.argmap = nil
From the stack frame diagram, you can see that the FP of the child function is equal to the SP of the parent function.Knowing the SP and PC of the parent function, repeat the steps above to find out the entire call chain where the function is located, which is what we usually see in the call chain of panic.
with adjustframe End
func adjustframe(frame *stkframe, arg unsafe.Pointer) bool { adjinfo := (*adjustinfo)(arg) ...... f := frame.fn ...... locals, args := getStackMap(frame, &adjinfo.cache, true) // Adjust local variables if stack frame has been allocated. if locals.n > 0 { size := uintptr(locals.n) * sys.PtrSize adjustpointers(unsafe.Pointer(frame.varp-size), &locals, adjinfo, f) } // Adjust saved base pointer if there is one. if sys.ArchFamily == sys.AMD64 && frame.argp-frame.varp == 2*sys.RegSize { ...... adjustpointer(adjinfo, unsafe.Pointer(frame.varp)) } // Adjust arguments. if args.n > 0 { ...... adjustpointers(unsafe.Pointer(frame.argp), &args, adjinfo, f) } return true }
adopt gentraceback Get the frame's exact location on the stack, combine Stack frame layout So we know the function parameter argp and the variable varp address.On 64-bit systems, each pointer takes up 8 bytes.With 8 as the step size, you can derive and adjust the pointers in the function parameters and variables.
Come here, the source code analysis of the consortium stack has been completed. From the above we know how to implement the continuous stack, and have a lot of benefits. Next, look at the shortcomings and benefits of the continuous stack.
Disadvantages of continuous stacks
Continuous stacks solve two problems with piecewise stacks, but there are other problems with this implementation:
- More virtual memory fragmentation.Especially if you need a larger stack, allocating a contiguous block of memory becomes more difficult
- Pointers are restricted to the stack.Pointers for two protocols are not allowed to point to each other in go.This increases the complexity of the implementation.
Profit
This data comes from Contiguous stacks.
- Stack growth doubled by 10%, increased by 50% by only 2%, and increased by 25% by 20%.
- Hot split performance issue.
segmented stacks: no split: 1.25925147s with split: 5.372118558s <- Here we go hot split problem both split: 1.293200571s contiguous stacks: no split: 1.261624848s with split: 1.262939769s both split: 1.29008309s