Go Source: Coprocess Stack

Tips

Go version 1.12
Inspired by Contiguous stacks
When it comes to implementation details, you need to Stack Frame And pointer operation basis.

Preface

Stack management use of go prior to version 1.4 Segmented stack Mechanism implementation.Implementation: When a function is detected to require more stacks, a new stack is allocated, and the old stack and the new stack are connected using a pointer, and the function is released upon return.There are two problems with such a mechanism:

There is a "hot split" problem when calling the same function multiple times in a loop, for example: stacksplit.go
Additional consumption per allocation and release

To solve these two problems, the official use is continuous stacks.Continuous stack implementation: When more stacks are detected, a stack twice as large as the original is allocated, the old stack data is copied to the new stack, and the old stack is released.

Contiguous Stack

The stack has a large amount of code to expand and shrink, so it has been greatly streamlined.Before looking at the source code for a continuous stack, consider the following questions:

What trigger conditions for expansion and shrinking?
How can the size of expansion and shrinkage be calculated?
What does this process do?Does it affect performance?

Stack expansion

func newstack() {
    thisg := getg()
    ......
    gp := thisg.m.curg
    ......
    // Allocate a bigger segment and move the stack.
    oldsize := gp.stack.hi - gp.stack.lo
    newsize := oldsize * 2 // Twice as big as it used to be
    ......
    // The goroutine must be executing in order to call newstack,
    // so it must be Grunning (or Gscanrunning).
    casgstatus(gp, _Grunning, _Gcopystack) //Modify protocol state

    // The concurrent GC will not scan the stack while we are doing the copy since
    // the gp is in a Gcopystack status.
    copystack(gp, newsize, true) //As we will see below
    ......
    casgstatus(gp, _Gcopystack, _Grunning)
    gogo(&gp.sched)
}

Each function takes up stack space to hold variables, parameters, and so on.A function running in a coprocess naturally occupies the stack of coprocesses running it.However, the stack of a protocol is limited, and if it finds that it is not enough, it calls stackalloc to allocate a new stack twice as large.

Stack Compression

func shrinkstack(gp *g) {
    gstatus := readgstatus(gp)
    ......
    oldsize := gp.stack.hi - gp.stack.lo
    newsize := oldsize / 2 // Twice smaller than before
    // Don't shrink the allocation below the minimum-sized stack
    // allocation.
    if newsize < _FixedStack {
        return
    }
    // Compute how much of the stack is currently in use and only
    // shrink the stack if gp is using less than a quarter of its
    // current stack. The currently used stack includes everything
    // down to the SP plus the stack guard space that ensures
    // there's room for nosplit functions.
    avail := gp.stack.hi - gp.stack.lo
    //Shrink when used stacks account for less than a quarter of the total stack
    if used := gp.stack.hi - gp.sched.sp + _StackLimit; used >= avail/4 {
        return
    }

    copystack(gp, newsize, false) //As we will see below
}

Stack shrinking occurs mainly during GC.A protocol becomes resident and takes up a lot of memory when it is busy, but less when it is idle, which wastes a lot of memory. To avoid wasting Go shrinking the stack of the protocol when it is in GC, the shrinking also allocates a new memory to replace the original, which is only 1/2 the size.

What does this process do?

func copystack(gp *g, newsize uintptr, sync bool) {
    ......
    old := gp.stack
    ......
    used := old.hi - gp.sched.sp

    // allocate new stack
    new := stackalloc(uint32(newsize))
    ......
    // Compute adjustment.
    var adjinfo adjustinfo
    adjinfo.old = old
    adjinfo.delta = new.hi - old.hi //Adjustment for old stack pointer

    //Later opportunity to analyze with select / chan
    // Adjust sudogs, synchronizing with channel ops if necessary.
    ncopy := used
    if sync {
        adjustsudogs(gp, &adjinfo)
    } else {
        ......
        adjinfo.sghi = findsghi(gp, old)

        // Synchronize with channel ops and copy the part of
        // the stack they may interact with.
        ncopy -= syncadjustsudogs(gp, used, &adjinfo)
    }
    //Copy old stack data to new stack
    // Copy the stack (or the rest of it) to the new location
    memmove(unsafe.Pointer(new.hi-ncopy), unsafe.Pointer(old.hi-ncopy), ncopy)

    // Adjust remaining structures that have pointers into stacks.
    // We have to do most of these before we traceback the new
    // stack because gentraceback uses them.
    adjustctxt(gp, &adjinfo)
    adjustdefers(gp, &adjinfo)
    adjustpanics(gp, &adjinfo)
    ......
    // Swap out old stack for new one
    gp.stack = new
    gp.stackguard0 = new.lo + _StackGuard // NOTE: might clobber a preempt request
    gp.sched.sp = new.hi - used
    gp.stktopsp += adjinfo.delta
    // Adjust pointers in the new stack.
    gentraceback(^uintptr(0), ^uintptr(0), 0, gp, 0, nil, 0x7fffffff, adjustframe, noescape(unsafe.Pointer(&adjinfo)), 0)
    ......
    //Release Old Stack
    stackfree(old)
}

Many adjustments have been made during the process of expanding and shrinking.From the implementation of continuous stacks, we know that whether it's expansion or contraction, we reapply for a new stack, and then copy the data from the old stack to the new stack.The physical memory used by the collaboration has been completely replaced, and Go s will save the pointer in memory when running, for example, gp.sched.ctxt, gp._defer, gp._panic, including the pointer in the function.This part of the pointer value is converted to an integer uintptr and then adjusted with + delta.

func adjustpointer(adjinfo *adjustinfo, vpp unsafe.Pointer) {
    pp := (*uintptr)(vpp)
    p := *pp
    ......
    //If this integer number is in the range of the old stack, adjust
    if adjinfo.old.lo <= p && p < adjinfo.old.hi {
        *pp = p + adjinfo.delta
        ......
    }
}

Frame Adjustment

If you just want to know the scaling of the stack, that's enough.This part goes into the details and is not interesting to skip.Before you learn about Frame adjustments, look at Stack Frame .Stack Frame: The memory space occupied by a function when it runs is a collection of data on the stack that includes:

Local variables
Saved copies of registers modified by subprograms that could need restoration
Argument parameters
Return address

FP，SP，PC ，LR

FP: Frame Pointer
– Points to the bottom of the argument list
SP: Stack Pointer
– Points to the top of the space allocated for local variables
PC: Program Counter
LR: Caller's Program Counter

Stack frame layout

// (x86)  
// +------------------+  
// | args from caller |  
// +------------------+ <- frame->argp  
// |  return address  |  
// +------------------+  
// |  caller's BP (*) | (*) if framepointer_enabled && varp < sp  
// +------------------+ <- frame->varp  
// |     locals       |  
// +------------------+  
// |  args to callee  |  
// +------------------+ <- frame->sp

For X86 and ARM in Go Stack frame layout It will be different, here only X86 is analyzed.

To visualize the results of the Frame adjustment, let's look at the following examples:

func bb(a *int, aa *int) {
    var v1 int
    println("v1 before morestack", uintptr(unsafe.Pointer(&v1)))

    cc(0)

    println("a after morestack", uintptr(unsafe.Pointer(a)))
    println("aa after morestack", uintptr(unsafe.Pointer(aa)))
    println("v1 after morestack", uintptr(unsafe.Pointer(&v1)))
}

// for morestack
func cc(i int){
    i++
    if i >= 30 {
        println("morestack done")
    }else{
        cc(i)
    }
}

func main()  {
    wg := sync.WaitGroup{}
    wg.Add(1)
    go func() {
        var a, aa int
        a = 1000
        aa = 1000

        println("a before morestack", uintptr(unsafe.Pointer(&a)))
        println("aa before morestack", uintptr(unsafe.Pointer(&aa)))

        bb(&a, &aa)
        wg.Done()
    }()
    wg.Wait()
}

Result:

a before morestack 824633925560
aa before morestack 824633925552
v1 before morestack 824633925504
morestack done
a after morestack 824634142648
aa after morestack 824634142640
v1 after morestack 824634142592

From the results, we can see that bb's parameters a, a A and variable v1 address have changed after expansion. How does this change happen?We will focus on the following three issues:

How to confirm the location of the function Frame
How to find function parameters, pointers to variables
How to confirm the Frame of a parent function

from gentraceback start

func gentraceback(pc0, sp0, lr0 uintptr, gp *g, skip int, pcbuf *uintptr, max int, callback func(*stkframe, unsafe.Pointer) bool, v unsafe.Pointer, flags uint) int {
    ......
    g := getg()
    ......
    if pc0 == ^uintptr(0) && sp0 == ^uintptr(0) { // Signal to fetch saved values from gp.
        if gp.syscallsp != 0 {
            ......
        } else {
            //Running Location
            pc0 = gp.sched.pc
            sp0 = gp.sched.sp
            ......
        }
    }
    nprint := 0
    var frame stkframe
    frame.pc = pc0
    frame.sp = sp0
    ......
    f := findfunc(frame.pc)
    ......
    frame.fn = f

    n := 0
    for n < max {
        ......
        f = frame.fn
        if f.pcsp == 0 {
            // No frame information, must be external function, like race support.
            // See golang.org/issue/13568.
            break
        }
        ......
        if frame.fp == 0 {
            sp := frame.sp
            ......
            //Calculate FP
            frame.fp = sp + uintptr(funcspdelta(f, frame.pc, &cache))
            if !usesLR {
                // On x86, call instruction pushes return PC before entering new function.
                frame.fp += sys.RegSize
            }
        }
        var flr funcInfo
        if topofstack(f, gp.m != nil && gp == gp.m.g0) {
            ......
        } else if usesLR && f.funcID == funcID_jmpdefer {
            ......
        } else {
            var lrPtr uintptr
            if usesLR {
                ......
            } else {
                if frame.lr == 0 {
                    //Gets the PC value of the calling function
                    lrPtr = frame.fp - sys.RegSize
                    frame.lr = uintptr(*(*sys.Uintreg)(unsafe.Pointer(lrPtr)))
                }
            }
            flr = findfunc(frame.lr)
            ......
        }

        frame.varp = frame.fp
        if !usesLR {
            // On x86, call instruction pushes return PC before entering new function.
            frame.varp -= sys.RegSize
        }
        ......
        if framepointer_enabled && GOARCH == "amd64" && frame.varp > frame.sp {
            frame.varp -= sys.RegSize
        }
        ......
        if callback != nil || printing {
            frame.argp = frame.fp + sys.MinFrameSize
            ......
        }
        ......
        //Currently adjusting frame
        if callback != nil {
            if !callback((*stkframe)(noescape(unsafe.Pointer(&frame))), v) {
                return n
            }
        }
        ......
        n++
    skipped:
        ......
    //Confirm Parent Frame
        // Unwind to next frame.
        frame.fn = flr
        frame.pc = frame.lr
        frame.lr = 0
        frame.sp = frame.fp
        frame.fp = 0
        frame.argmap = nil
        ......
    }
    ......
    return n
}

gentraceback There's a lot of code, so here's a simplification of adjusting the parameters passed in to the Frame and what we're going to explore.It's still a long time after simplification. Don't worry, we'll strip out this function one by one.

Confirm Current Location

Go runtime has saved the PC to gp.sched.pc and SP to gp.sched.sp when scaling occurred.
Find Function Information

Function parameters, number of variables, frame size, file line and other information, compiled and saved into the execution file, when executed, loaded into memory, this part of the data can be obtained by PC: findfunc -> findmoduledatap
```
func findmoduledatap(pc uintptr) *moduledata {
       for datap := &firstmoduledata; datap != nil; datap = datap.next {
           if datap.minpc <= pc && pc < datap.maxpc {
               return datap
           }
       }
       return nil
}
```
Calculate FP

frame.fp = sp + uintptr(funcspdelta(f, frame.pc, &cache))

SP can be understood as the top of the function, FP is the bottom of the function, with SP, there is no function length (frame size).Actually we can get it from pcsp because it has been mapped into memory. See Go 1.2 Runtime Symbol Information .Knowing FP and SP, we know where the function is on the cockpit.

Get parent function PC directive (LR)
```
lrPtr = frame.fp - sys.RegSize
frame.lr = uintptr(*(*sys.Uintreg)(unsafe.Pointer(lrPtr)))
```
The PC instruction of the parent function is placed in the return address position of the stack frame graph, and we can take it out directly, according to this instruction we get the information of the parent function.
Confirm Parent Function Frame

frame.fn = flr
frame.pc = frame.lr
frame.lr = 0
frame.sp = frame.fp
frame.fp = 0
frame.argmap = nil

From the stack frame diagram, you can see that the FP of the child function is equal to the SP of the parent function.Knowing the SP and PC of the parent function, repeat the steps above to find out the entire call chain where the function is located, which is what we usually see in the call chain of panic.

with adjustframe End

func adjustframe(frame *stkframe, arg unsafe.Pointer) bool {
    adjinfo := (*adjustinfo)(arg)
    ......
    f := frame.fn
    ......
    locals, args := getStackMap(frame, &adjinfo.cache, true)
    // Adjust local variables if stack frame has been allocated.
    if locals.n > 0 {
        size := uintptr(locals.n) * sys.PtrSize
        adjustpointers(unsafe.Pointer(frame.varp-size), &locals, adjinfo, f)
    }

    // Adjust saved base pointer if there is one.
    if sys.ArchFamily == sys.AMD64 && frame.argp-frame.varp == 2*sys.RegSize {
        ......
        adjustpointer(adjinfo, unsafe.Pointer(frame.varp))
    }
    // Adjust arguments.
    if args.n > 0 {
        ......
        adjustpointers(unsafe.Pointer(frame.argp), &args, adjinfo, f)
    }
    return true
}

adopt gentraceback Get the frame's exact location on the stack, combine Stack frame layout So we know the function parameter argp and the variable varp address.On 64-bit systems, each pointer takes up 8 bytes.With 8 as the step size, you can derive and adjust the pointers in the function parameters and variables.

Come here, the source code analysis of the consortium stack has been completed. From the above we know how to implement the continuous stack, and have a lot of benefits. Next, look at the shortcomings and benefits of the continuous stack.

Disadvantages of continuous stacks

Continuous stacks solve two problems with piecewise stacks, but there are other problems with this implementation:

More virtual memory fragmentation.Especially if you need a larger stack, allocating a contiguous block of memory becomes more difficult
Pointers are restricted to the stack.Pointers for two protocols are not allowed to point to each other in go.This increases the complexity of the implementation.

Profit

This data comes from Contiguous stacks.

Stack growth doubled by 10%, increased by 50% by only 2%, and increased by 25% by 20%.
Hot split performance issue.

segmented stacks:

no split: 1.25925147s
with split: 5.372118558s   <- Here we go hot split problem
both split: 1.293200571s

contiguous stacks:

no split: 1.261624848s
with split: 1.262939769s
both split: 1.29008309s

Link

Keywords: Go less REST Assembly Language

Added by myfafa on Mon, 24 Jun 2019 19:35:24 +0300

Programming VIP