0%

《Golang》Golang 垃圾回收

Golang 运行时中的 GC 垃圾回收是非常复杂的一个功能模块,下面做一些归纳总结。

GC的发展

  1. 标记-清除算法。STW -> 标记 -> 清除 -> 解除 STW,STW 造成很大的性能问题
  2. 三色并发标记法。引入三色标记法、写屏障(被添加的对象标记为灰色)共同实现并发标记
  3. 引入混合写屏障机制,结合写屏障和删除屏障(被删除的对象标记为灰色)的优点,进一步减少 STW 的时间

三色标记清除

  • 白色集合:一组可以回收内存的对象。(也就是最后要清扫的垃圾)
  • 灰色集合:包括所有可以从根对象访问到的,但是还没有扫描引用的”白色“对象的对象。由于这些对象可以从根对象访问到,所以它们不能 collector 收集,并且被扫描后它们移入到黑色集合中
  • 黑色集合:可以显示为没有对白色集合中对象的传出引用并可从根目录访问的对象集合。黑色集合中的对象不被收集。

最初开始的时候,黑色集合中是空的,灰色集合是根对象引用的对象集合,白色集合包含其他所有对象。

下面是标记过程:

  1. 从灰色集合中选取一个物体将其自身移至黑色集合
  2. 将其引用的每个白色对象移至灰色集合
  3. 重复前两个步骤,直到灰色集合为空

屏障技术

只有栈中变量引用堆中变量,没有堆中变量引用栈中变量的,因为栈区变量不由GC回收的,是自动回收

但有可能栈中变量引用栈中变量,比如同一个函数内,即同一个栈帧内,a、b 都分配在栈上,a.a = &b

如果栈中变量被函数外某变量引用,那么栈中变量会发生逃逸(编译期确定的),进入堆中。因为栈帧在函数执行完就没了,其中变量都回收了,函数外部的引用就会出问题

1
2
3
4
5
6
7
8
9
10
11
package main

import (
"fmt"
)

func main() {
a := 1
b := &a
fmt.Println(b)
}

** go build –gcflags=”-m” ./bin/main/ **

输出:

1
2
3
4
5
6
7
bin/main/main.go:18:13: inlining call to fmt.Println
bin/main/main.go:16:2: moved to heap: a
bin/main/main.go:18:13: []interface {} literal does not escape
<autogenerated>:1: .this does not escape
<autogenerated>:1: leaking param: .this
<autogenerated>:1: leaking param content: .this
<autogenerated>:1: leaking param: .this

可以看到a变量发生了逃逸

所以栈中的变量不可能被引用的,栈中变量永远标记黑色

屏障技术不会用在栈上,只会用在堆上

Dijkstra 写屏障

首先说两个不存在写屏障时的并发问题

A 是根对象,GC 遍历到 A 时,A 变黑,B 变灰;接着用户程序把 A 到 B 的引用改成了 A 到 C 的引用;然后 GC 接着开始遍历到 B,B 没有引用对象,B 变黑;此时灰色集合已空,标记结束,C 是白色集合所以被清理了。

Obj 是根对象结构体,GC 遍历到 Obj 时,Obj 变黑;接着用户程序在 Obj 中新添加一个变量 C,初始是白色;然后 GC 结束标记后,C 就会被清除。

上述例子的出现说明并发标记(没有 STW,标记协程与其他协程并发执行)会出现不该被清除的也被清除了

为了解决这个问题,引入了写屏障(Write Barrier),写屏障相当于程序在写指针操作的地方插入一段代码,当用户程序修改内存引用时,如果有黑色对象引用了白色对象,则把相应白色对象移入灰色集合,如果是新分配的对象则直接放入黑色集合。

Dijkstra 写屏障简单来讲就是:被添加的对象标记为灰色

Yuasa 删除屏障

一样的,举一个并发出问题的例子

假设下面是标记一段时间后的 3 个对象的状态

a(灰色) -> b(白色) -> c(白色)

此时标记协程与 g1 协程并发,g1 协程先把 a 到 b 的引用删除了,变成了

a(灰色) b(白色) -> c(白色)

标记协程继续标记,a 变黑,b、c 不变,变成了

a(黑色) b(白色) -> c(白色)

此时 g1 新建一个对象 d(被标记为黑色),指向了 b(注意这里,如果用了写屏障,这里 b 会被标记灰色,就没有问题,但是如果没有使用写屏障,就会出问题)

标记协程继续,发现没有了灰色对象,标记结束,随后清除,正被 d 引用的 b、c 全给清除了

如果使用了删除屏障,b 将会被标记灰色,最终这轮 gc 后,b、c 都变成黑色。如果这轮 GC 中没有其他新对象引用 b,那么 b、c 本应该在这轮就清除的,结果不是,而是下一轮被清除,所以说删除屏障的回收精度低

在这种实现方式中,回收器悲观的认为所有被删除的对象都可能会被黑色对象引用

Yuasa 删除屏障简单来讲就是:被删除的对象标记为灰色

触发 GC 时机

  • 内存分配过程中,当分配的对象大小 >32 kb,或是分配小对象时发现 span 已经满了时,会触发 GC。(src/runtime/malloc.go 中的 mallocgc 函数)
  • 通过调用 runtime.GC() 函数,这是阻塞式的。启用的时候会堵塞调用者直到垃圾回收完成。
  • sysmon 独立线程会对运行状态进行监控,如果超过两分钟没有触发 GC 任务,则强制触发。

GC 过程

  1. Sweep Termination: 对未清扫的 span 进行清扫, 只有上一轮的 GC 的清扫工作完成才可以开始新一轮的 GC
  2. Mark: 扫描所有根对象, 和根对象可以到达的所有对象, 标记它们不被回收
  3. Mark Termination: 完成标记工作, 重新扫描部分根对象(要求 STW )
  4. Sweep: 按标记结果清扫 span

整个 GC 流程会进行两次 STW (Stop The World), 第一次是 Mark 阶段的开始, 第二次是 Mark Termination 阶段.
第一次 STW 会准备根对象的扫描, 启动写屏障( Write Barrier )和辅助 GC(mutator assist).
第二次 STW 会重新扫描部分根对象, 禁用写屏障( Write Barrier )和辅助 GC(mutator assist).

几个 gc 相关的协程

GC 开始时,每个 p 下面启动一个后台标记任务协程(已经有就不再启动),但是一个 GC 周期内,任何 p 下的标记任务的总标记时间不得超过总标记时间的 25%。

后台碎片清理任务协程、后台擦除任务协程、强制 gc 的任务协程在程序启动时建立,之后一直是唤醒(后台擦除任务协程由后台标记任务唤醒,强制 gc 的协程由 sysmon 唤醒)、睡眠之间切换

源码分析

runtime.main 函数中使用 gcenable 函数启动了后台碎片清理任务协程以及后台擦除任务协程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
func gcenable() {
// Kick off sweeping and scavenging.
c := make(chan int, 2)
go bgsweep(c) // 启动gc擦除的协程
go bgscavenge(c) // 启动gc碎片清理的协程
<-c
<-c
memstats.enablegc = true // now that runtime is initialized, GC is okay
}

func bgsweep(c chan int) {
sweep.g = getg()

lock(&sweep.lock)
sweep.parked = true
c <- 1
goparkunlock(&sweep.lock, waitReasonGCSweepWait, traceEvGoBlock, 1) // 先停下来,等待goready(sweep.g)激活

for {
for sweepone() != ^uintptr(0) { // 一个个span擦除,直到找不到可以擦除的span才跳出循环
sweep.nbgsweep++
Gosched() // 让出CPU
}
for freeSomeWbufs(true) { // 释放workbufs
Gosched()
}
lock(&sweep.lock)
if !isSweepDone() { // 检查擦除是否完成,没完成的话继续
// This can happen if a GC runs between
// gosweepone returning ^0 above
// and the lock being acquired.
unlock(&sweep.lock)
continue
}
sweep.parked = true
goparkunlock(&sweep.lock, waitReasonGCSweepWait, traceEvGoBlock, 1) // 停止当前g,进入调度
}
}

func bgscavenge(c chan int) {
scavenge.g = getg()

lock(&scavenge.lock)
scavenge.parked = true

scavenge.timer = new(timer)
scavenge.timer.f = func(_ interface{}, _ uintptr) {
wakeScavenger()
}

c <- 1
goparkunlock(&scavenge.lock, waitReasonGCScavengeWait, traceEvGoBlock, 1) // 先停下来,等待goready(scavenge.g)激活

// Exponentially-weighted moving average of the fraction of time this
// goroutine spends scavenging (that is, percent of a single CPU).
// It represents a measure of scheduling overheads which might extend
// the sleep or the critical time beyond what's expected. Assume no
// overhead to begin with.
//
// TODO(mknyszek): Consider making this based on total CPU time of the
// application (i.e. scavengePercent * GOMAXPROCS). This isn't really
// feasible now because the scavenger acquires the heap lock over the
// scavenging operation, which means scavenging effectively blocks
// allocators and isn't scalable. However, given a scalable allocator,
// it makes sense to also make the scavenger scale with it; if you're
// allocating more frequently, then presumably you're also generating
// more work for the scavenger.
const idealFraction = scavengePercent / 100.0
scavengeEWMA := float64(idealFraction)

for {
released := uintptr(0)

// Time in scavenging critical section.
crit := float64(0)

// Run on the system stack since we grab the heap lock,
// and a stack growth with the heap lock means a deadlock.
systemstack(func() {
lock(&mheap_.lock)

// If background scavenging is disabled or if there's no work to do just park.
retained, goal := heapRetained(), mheap_.scavengeGoal
if retained <= goal {
unlock(&mheap_.lock)
return
}
unlock(&mheap_.lock)

// Scavenge one page, and measure the amount of time spent scavenging.
start := nanotime()
released = mheap_.pages.scavengeOne(physPageSize, false)
atomic.Xadduintptr(&mheap_.pages.scavReleased, released)
crit = float64(nanotime() - start)
})

if released == 0 {
lock(&scavenge.lock)
scavenge.parked = true
goparkunlock(&scavenge.lock, waitReasonGCScavengeWait, traceEvGoBlock, 1)
continue
}

// Multiply the critical time by 1 + the ratio of the costs of using
// scavenged memory vs. scavenging memory. This forces us to pay down
// the cost of reusing this memory eagerly by sleeping for a longer period
// of time and scavenging less frequently. More concretely, we avoid situations
// where we end up scavenging so often that we hurt allocation performance
// because of the additional overheads of using scavenged memory.
crit *= 1 + scavengeCostRatio

// If we spent more than 10 ms (for example, if the OS scheduled us away, or someone
// put their machine to sleep) in the critical section, bound the time we use to
// calculate at 10 ms to avoid letting the sleep time get arbitrarily high.
const maxCrit = 10e6
if crit > maxCrit {
crit = maxCrit
}

// Compute the amount of time to sleep, assuming we want to use at most
// scavengePercent of CPU time. Take into account scheduling overheads
// that may extend the length of our sleep by multiplying by how far
// off we are from the ideal ratio. For example, if we're sleeping too
// much, then scavengeEMWA < idealFraction, so we'll adjust the sleep time
// down.
adjust := scavengeEWMA / idealFraction
sleepTime := int64(adjust * crit / (scavengePercent / 100.0))

// Go to sleep.
slept := scavengeSleep(sleepTime)

// Compute the new ratio.
fraction := crit / (crit + float64(slept))

// Set a lower bound on the fraction.
// Due to OS-related anomalies we may "sleep" for an inordinate amount
// of time. Let's avoid letting the ratio get out of hand by bounding
// the sleep time we use in our EWMA.
const minFraction = 1 / 1000
if fraction < minFraction {
fraction = minFraction
}

// Update scavengeEWMA by merging in the new crit/slept ratio.
const alpha = 0.5
scavengeEWMA = alpha*fraction + (1-alpha)*scavengeEWMA
}
}

强制 gc 的任务协程在 runtime 包的 init 函数下启动,而各个包的 init 函数也在 runtime.main 函数中执行的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
func init() {
go forcegchelper()
}

func forcegchelper() {
forcegc.g = getg()
for {
lock(&forcegc.lock)
if forcegc.idle != 0 {
throw("forcegc: phase error")
}
atomic.Store(&forcegc.idle, 1)
goparkunlock(&forcegc.lock, waitReasonForceGGIdle, traceEvGoBlock, 1) // 进入等待,等待sysmon唤醒
// this goroutine is explicitly resumed by sysmon
if debug.gctrace > 0 {
println("GC forced")
}
// Time-triggered, fully concurrent.
gcStart(gcTrigger{kind: gcTriggerTime, now: nanotime()})
}
}

强制 gc 的任务协程是在 sysmon 中唤醒的,下面看看 sysmon 函数中的部分代码

1
2
3
4
5
6
7
8
9
// check if we need to force a GC
if t := (gcTrigger{kind: gcTriggerTime, now: now}); t.test() && atomic.Load(&forcegc.idle) != 0 { // 如果两分钟没有触发GC任务(test函数中处理的)且forcegc处于空闲
lock(&forcegc.lock)
forcegc.idle = 0 // 改成繁忙
var list gList
list.push(forcegc.g)
injectglist(&list) // 将forcegc协程放入可运行队列
unlock(&forcegc.lock)
}

上面基本都是调度,真正的 GC 操作在 gcStart 函数中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
func gcStart(trigger gcTrigger) {
// Since this is called from malloc and malloc is called in
// the guts of a number of libraries that might be holding
// locks, don't attempt to start GC in non-preemptible or
// potentially unstable situations.
mp := acquirem()
if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" {
releasem(mp)
return
}
releasem(mp)
mp = nil

// Pick up the remaining unswept/not being swept spans concurrently
//
// This shouldn't happen if we're being invoked in background
// mode since proportional sweep should have just finished
// sweeping everything, but rounding errors, etc, may leave a
// few spans unswept. In forced mode, this is necessary since
// GC can be forced at any point in the sweeping cycle.
//
// We check the transition condition continuously here in case
// this G gets delayed in to the next GC cycle.
for trigger.test() && sweepone() != ^uintptr(0) {
sweep.nbgsweep++
}

// Perform GC initialization and the sweep termination
// transition.
semacquire(&work.startSema)
// Re-check transition condition under transition lock.
if !trigger.test() {
semrelease(&work.startSema)
return
}

// For stats, check if this GC was forced by the user.
work.userForced = trigger.kind == gcTriggerCycle

// In gcstoptheworld debug mode, upgrade the mode accordingly.
// We do this after re-checking the transition condition so
// that multiple goroutines that detect the heap trigger don't
// start multiple STW GCs.
mode := gcBackgroundMode
if debug.gcstoptheworld == 1 {
mode = gcForceMode
} else if debug.gcstoptheworld == 2 {
mode = gcForceBlockMode
}

// Ok, we're doing it! Stop everybody else
semacquire(&worldsema)

if trace.enabled {
traceGCStart()
}

// Check that all Ps have finished deferred mcache flushes.
for _, p := range allp {
if fg := atomic.Load(&p.mcache.flushGen); fg != mheap_.sweepgen {
println("runtime: p", p.id, "flushGen", fg, "!= sweepgen", mheap_.sweepgen)
throw("p mcache not flushed")
}
}

gcBgMarkStartWorkers() // 检查所有p,保证所有p下面都有一个用来标记的g协程

systemstack(gcResetMarkState)

work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs
if work.stwprocs > ncpu {
// This is used to compute CPU time of the STW phases,
// so it can't be more than ncpu, even if GOMAXPROCS is.
work.stwprocs = ncpu
}
work.heap0 = atomic.Load64(&memstats.heap_live)
work.pauseNS = 0
work.mode = mode

now := nanotime()
work.tSweepTerm = now
work.pauseStart = now
if trace.enabled {
traceGCSTWStart(1)
}
systemstack(stopTheWorldWithSema) // g0栈中执行STW停止所有p,这句执行完后,其他m都进入了休眠
// Finish sweep before we start concurrent scan.
systemstack(func() {
finishsweep_m() // 擦除所有可擦除的span
})
// clearpools before we start the GC. If we wait they memory will not be
// reclaimed until the next GC cycle.
clearpools()

work.cycles++

gcController.startCycle() // 开始一个GC周期
work.heapGoal = memstats.next_gc

// In STW mode, disable scheduling of user Gs. This may also
// disable scheduling of this goroutine, so it may block as
// soon as we start the world again.
if mode != gcBackgroundMode {
schedEnableUser(false)
}

// Enter concurrent mark phase and enable
// write barriers.
//
// Because the world is stopped, all Ps will
// observe that write barriers are enabled by
// the time we start the world and begin
// scanning.
//
// Write barriers must be enabled before assists are
// enabled because they must be enabled before
// any non-leaf heap objects are marked. Since
// allocations are blocked until assists can
// happen, we want enable assists as early as
// possible.
setGCPhase(_GCmark)

gcBgMarkPrepare() // Must happen before assist enable.
gcMarkRootPrepare()

// Mark all active tinyalloc blocks. Since we're
// allocating from these, they need to be black like
// other allocations. The alternative is to blacken
// the tiny block on every allocation from it, which
// would slow down the tiny allocator.
gcMarkTinyAllocs() // 对所有p下的mcache中的tiny块进行标记

// At this point all Ps have enabled the write
// barrier, thus maintaining the no white to
// black invariant. Enable mutator assists to
// put back-pressure on fast allocating
// mutators.
atomic.Store(&gcBlackenEnabled, 1)

// Assists and workers can start the moment we start
// the world.
gcController.markStartTime = now

// Concurrent mark.
systemstack(func() {
now = startTheWorldWithSema(trace.enabled) // 重启世界,这句之后所有m都被唤醒了
work.pauseNS += now - work.pauseStart
work.tMark = now
})
// In STW mode, we could block the instant systemstack
// returns, so don't do anything important here. Make sure we
// block rather than returning to user code.
if mode != gcBackgroundMode {
Gosched() // 进入调度,目的是让出CPU
}

semrelease(&work.startSema)
}

下篇预告

Golang 锁原理




微信关注我,及时接收最新技术文章