How did the performance of Go1.13 defer improve?

Recently, Go1.13 was finally released. One of the noteworthy features is that defer has improved performance by 30% in most scenarios, but the official did not specify how to improve it, which makes everyone very confused. And I wrote it before. Deep Understanding Go defer and Go defer will suffer from performance degradation. Do not use it as much as possible? " This kind of article, so I'm interested in what changes it has made to get such results, so today I'm going to explore the mystery with you.

Original address: How did the performance of Go1.13 defer improve?

I. Testing


$ go test -bench=. -benchmem -run=none
goos: darwin
goarch: amd64
BenchmarkDoDefer-4          20000000            91.4 ns/op          48 B/op           1 allocs/op
BenchmarkDoNotDefer-4       30000000            41.6 ns/op          48 B/op           1 allocs/op
ok    3.234s


$ go test -bench=. -benchmem -run=none
goos: darwin
goarch: amd64
BenchmarkDoDefer-4          15986062            74.7 ns/op          48 B/op           1 allocs/op
BenchmarkDoNotDefer-4       29231842            40.3 ns/op          48 B/op           1 allocs/op
ok    3.444s

In the beginning, I validated the previous test cases with non-standard benchmarks. It is true that defer performance has improved in both versions, but it does not seem to increase by 30% per cent.

2. Look at it.

Before (Go1.12)

    0x0070 00112 (main.go:6)    CALL    runtime.deferproc(SB)
    0x0075 00117 (main.go:6)    TESTL    AX, AX
    0x0077 00119 (main.go:6)    JNE    137
    0x0079 00121 (main.go:7)    XCHGL    AX, AX
    0x007a 00122 (main.go:7)    CALL    runtime.deferreturn(SB)
    0x007f 00127 (main.go:7)    MOVQ    56(SP), BP

Now (Go1.13)

    0x006e 00110 (main.go:4)    MOVQ    AX, (SP)
    0x0072 00114 (main.go:4)    CALL    runtime.deferprocStack(SB)
    0x0077 00119 (main.go:4)    TESTL    AX, AX
    0x0079 00121 (main.go:4)    JNE    139
    0x007b 00123 (main.go:7)    XCHGL    AX, AX
    0x007c 00124 (main.go:7)    CALL    runtime.deferreturn(SB)
    0x0081 00129 (main.go:7)    MOVQ    112(SP), BP

From a compilation point of view, such as runtime.deferproc changed to runtime.deferprocStack call, is there any optimization, we continue to look with doubt.

3. Observation source code


type _defer struct {
    siz     int32
    siz     int32 // includes both arguments and results
    started bool
    heap    bool
    sp      uintptr // sp at time of defer
    pc      uintptr
    fn      *funcval

Compared with previous versions, the smallest unit's _defer structure mainly adds a heap field to identify whether the _defer is allocated on the stack or on the stack, and the rest of the fields have not been changed explicitly. We can focus on the defer's stack allocation to see what has been done.


func deferprocStack(d *_defer) {
    gp := getg()
    if gp.m.curg != gp {
        throw("defer on system stack")
    d.started = false
    d.heap = false
    d.sp = getcallersp()
    d.pc = getcallerpc()

    *(*uintptr)(unsafe.Pointer(&d._panic)) = 0
    *(*uintptr)(unsafe.Pointer(& = uintptr(unsafe.Pointer(gp._defer))
    *(*uintptr)(unsafe.Pointer(&gp._defer)) = uintptr(unsafe.Pointer(d))


This code is quite routine, mainly to get the function stack pointer calling defer function, the specific address of the parameters of the incoming function and PC (program counter), this block in the previous article. Deep Understanding Go defer After detailed introduction, I will not repeat it here.

What's special about this deferprocStack? We can see that it sets d.heap to false, which means that the deferprocStack method is for application scenarios where _defer is allocated on the stack.


So the question arises, where does it handle the application scenarios allocated to the heap?

func newdefer(siz int32) *_defer {
    d.heap = true = gp._defer
    gp._defer = d
    return d

So where is the newdefer called, as follows:

func deferproc(siz int32, fn *funcval) { // arguments of fn follow fn
    sp := getcallersp()
    argp := uintptr(unsafe.Pointer(&fn)) + unsafe.Sizeof(fn)
    callerpc := getcallerpc()

    d := newdefer(siz)

Clearly, the deferproc method invoked in previous versions is now used to correspond to scenarios allocated to the heap.


  • First point: It is certain that deferproc has not been removed, but that the process has been optimized.
  • Second point: The compiler chooses to use deferproc or deferproc Stack methods based on Application scenarios, which are for use scenarios allocated on the heap and stack, respectively.

4. How to Choose Compilers


// src/cmd/compile/internal/gc/esc.go
case ODEFER:
    if e.loopdepth == 1 { // top level
        n.Esc = EscNever // force stack allocation of defer record (see ssa.go)


// src/cmd/compile/internal/gc/ssa.go
case ODEFER:
    d := callDefer
    if n.Esc == EscNever {
        d = callDeferStack
    }, d)


The core of this combination is that when e.loopdepth = 1, the escape analysis result n.Esc will be set to EscNever, that is, to assign _defer to the stack. What is the holy meaning of this e.loopdepth? I think it should mean the depth of iteration. We can confirm that the code is as follows:

func main() {
    for p := 0; p < 10; p++ {
        defer func() {
            for i := 0; i < 20; i++ {

View the compilation:

$ go tool compile -S main.go
"".main STEXT size=122 args=0x0 locals=0x20
    0x0000 00000 (main.go:15)    TEXT    "".main(SB), ABIInternal, $32-0
    0x0048 00072 (main.go:17)    CALL    runtime.deferproc(SB)
    0x004d 00077 (main.go:17)    TESTL    AX, AX
    0x004f 00079 (main.go:17)    JNE    83
    0x0051 00081 (main.go:17)    JMP    33
    0x0053 00083 (main.go:17)    XCHGL    AX, AX
    0x0054 00084 (main.go:17)    CALL    runtime.deferreturn(SB)

Obviously, the ultimate defer call is the runtime.deferproc method, which is allocated to the heap, no problem.


From the analysis results, the official Go1.13 defer performance improved by 30%, mainly due to the change of the stack allocation rules of defer objects. The measure is that the compiler analyzes the depth of for-loop iteration of defer. If loopdepth is 1, the result of escape analysis is set and distributed to the stack. Otherwise, it's allocated to the heap.

Indeed, I personally feel that for most of the use scenarios, it has been optimized a lot, but also has solved some people's problems of defer performance. In addition, I think from Go1.13, you also need to know a little about its mechanism. Don't randomly come up with a wilderness nested iteration defer, which may not maximize efficiency.

If you want more details, you can see the defer section. Submission The official test cases are also included.

Tags: PHP github REST

Posted on Sat, 07 Sep 2019 06:15:32 -0700 by mykg4orce