When I first learned C and C++ languages, they all handed over the management of memory to developers. This method is the most flexible but also the most prone to problems, and has extremely high requirements for personnel; later, some advanced languages such as Java, JavaScript, Both C# and Go have languages that solve the problem of memory allocation and recycling, lower the development threshold and release productivity. However, it brings a burden to students who want to understand the principle in depth. This article mainly sorts out personal understanding from the perspective of memory allocation, and the garbage collection mechanism of Go will be introduced in subsequent articles.
Different levels of objects are distinguished in TCMalloc, corresponding to different allocation processes:
- Small object size: 0~256KB; allocation process: ThreadCache -> CentralCache -> HeapPage, most of the time, ThreadCache cache is enough, there is no need to access CentralCache and HeapPage, no lock allocation and no system calls, the allocation efficiency is very High.
- Medium object size: 257~1MB; allocation process: directly select the appropriate size in PageHeap, the maximum memory saved by the Span of 128 Page is 1MB.
- Large object size: >1MB; allocation process: select an appropriate number of pages from the large span set to form a span to store data.
In addition, TCMalloc also involves the method of merging multiple small areas into large areas when memory is released. If you are interested, you can read this article:
TCMalloc decryption
At the same time, it can be found that the number of traversals here is 100, which may be considered almost enough. After all, these operations also take time, so I need one from mheap.
If a free span is obtained, jump to the haveSpan code segment, where the freeindex and allocCache bitmap caches are updated, and the span is returned;
// Allocate a span to use in an mcache.
func (c *mcentral) cacheSpan() *mspan {
// Deduct credit for this span allocation and sweep if necessary.
spanBytes := uintptr(class_to_allocnpages[c.spanclass.sizeclass()]) * _PageSize
deductSweepCredit(spanBytes, 0)
traceDone := false
if trace.enabled {
traceGCSweepStart()
}
spanBudget := 100
var s *mspan
sl := newSweepLocker()
sg := sl.sweepGen
// Try partial swept spans first.
if s = c.partialSwept(sg).pop(); s != nil {
goto havespan
}
// Now try partial unswept spans.
for ; spanBudget >= 0; spanBudget-- {
s = c.partialUnswept(sg).pop()
if s == nil {
break
}
if s, ok := sl.tryAcquire(s); ok {
// We got ownership of the span, so let's sweep it and use it.
s.sweep(true)
sl.dispose()
goto havespan
}
}
// Now try full unswept spans, sweeping them and putting them into the
// right list if we fail to get a span.
for ; spanBudget >= 0; spanBudget-- {
s = c.fullUnswept(sg).pop()
if s == nil {
break
}
if s, ok := sl.tryAcquire(s); ok {
// We got ownership of the span, so let's sweep it.
s.sweep(true)
// Check if there's any free space.
freeIndex := s.nextFreeIndex()
if freeIndex != s.nelems {
s.freeindex = freeIndex
sl.dispose()
goto havespan
}
// Add it to the swept list, because sweeping didn't give us any free space.
c.fullSwept(sg).push(s.mspan)
}
// See comment for partial unswept spans.
}
sl.dispose()
if trace.enabled {
traceGCSweepDone()
traceDone = true
}
// We failed to get a span from the mcentral so get one from mheap.
s = c.grow()
if s == nil {
return nil
}
// At this point s is a span that should have free slots.
havespan:
if trace.enabled && !traceDone {
traceGCSweepDone()
}
n := int(s.nelems) - int(s.allocCount)
if n == 0 || s.freeindex == s.nelems || uintptr(s.allocCount) == s.nelems {
throw("span has no free objects")
}
freeByteBase := s.freeindex &^ (64 - 1)
whichByte := freeByteBase / 8
// Init alloc bits cache.
s.refillAllocCache(whichByte)
// Adjust the allocCache so that s.freeindex corresponds to the low bit in
// s.allocCache.
s.allocCache >>= s.freeindex % 64
return s
}
For mcache, if it feels that the remaining space of the span at the current level cannot meet the size required by the user, the span will be handed over to mcentral; mcentral judges whether it is directly placed in the heap for recycling or needs to be managed by itself according to the conditions. If it is managed by itself, then Then judge the relationship between the freeIndex of the span and the capacity. If there is remaining capacity, enter the partialSweep queue, and if there is no capacity, enter the fullSweep.
func (c *mcentral) uncacheSpan(s *mspan) {
if s.allocCount == 0 {
throw("uncaching span but s.allocCount == 0")
}
sg := mheap_.sweepgen
stale := s.sweepgen == sg+1
// Fix up sweepgen.
if stale {
// Span was cached before sweep began. It's our
// responsibility to sweep it.
//
// Set sweepgen to indicate it's not cached but needs
// sweeping and can't be allocated from. sweep will
// set s.sweepgen to indicate s is swept.
atomic.Store(&s.sweepgen, sg-1)
} else {
// Indicate that s is no longer cached.
atomic.Store(&s.sweepgen, sg)
}
// Put the span in the appropriate place.
if stale {
// It's stale, so just sweep it. Sweeping will put it on
// the right list.
//
// We don't use a sweepLocker here. Stale cached spans
// aren't in the global sweep lists, so mark termination
// itself holds up sweep completion until all mcaches
// have been swept.
ss := sweepLocked{s}
ss.sweep(false)
} else {
if int(s.nelems)-int(s.allocCount) > 0 {
// Put it back on the partial swept list.
c.partialSwept(sg).push(s)
} else {
// There's no free space and it's not stale, so put it on the
// full swept list.
c.fullSwept(sg).push(s)
}
}
}
It can be seen that both partial and full in mcentral are spanSet arrays with two elements. The purpose of this is actually a double-cache strategy. When garbage collection is only collected and performed concurrently with user coroutines, half of each time is collected and the other half is written. Alternate next time, so as to ensure that there is always space to allocate, instead of serially waiting for the garbage collection to complete before allocating space, and using space for time to improve response performance
type mcentral struct {
spanclass spanClass
partial [2]spanSet // list of spans with a free object
full [2]spanSet // list of spans with no free objects
}
The grow method in mcentral involves the memory allocation and management of mheap, which is described below.
mheap
mheap is similar to PageHeap in TCMalloc, representing the heap space held in Go, and the span managed by mcentral is also obtained from here. When mcentral has no free spans, it will apply to mheap. If there are no resources in mheap, it will apply to the operating system for memory. The application to the operating system is based on pages (4kb), and then the requested memory pages are organized according to the levels of page (8kb), span (multiple of page), chunk (512kb), heapArena (64m) .
Bitmap cache for pageCache
The grow method in mcentral will call the alloc method of mheap
// grow allocates a new empty span from the heap and initializes it for c's size class.
func (c *mcentral) grow() *mspan {
npages := uintptr(class_to_allocnpages[c.spanclass.sizeclass()])
size := uintptr(class_to_size[c.spanclass.sizeclass()])
s, _ := mheap_.alloc(npages, c.spanclass, true)
if s == nil {
return nil
}
// Use division by multiplication and shifts to quickly compute:
// n := (npages << _PageShift) / size
n := s.divideByElemSize(npages << _PageShift)
s.limit = s.base() + size*n
heapBitsForAddr(s.base()).initSpan(s)
return s
}
Then the allocSpan method is called internally.
func (h *mheap) alloc(npages uintptr, spanclass spanClass, needzero bool) (*mspan, bool) {
// Don't do any operations that lock the heap on the G stack.
// It might trigger stack growth, and the stack growth code needs
// to be able to allocate heap.
var s *mspan
systemstack(func() {
// To prevent excessive heap growth, before allocating n pages
// we need to sweep and reclaim at least n pages.
if !isSweepDone() {
h.reclaim(npages)
}
s = h.allocSpan(npages, spanAllocHeap, spanclass)
})
if s == nil {
return nil, false
}
isZeroed := s.needzero == 0
if needzero && !isZeroed {
memclrNoHeapPointers(unsafe.Pointer(s.base()), s.npages<<_PageShift)
isZeroed = true
}
s.needzero = 0
return s, isZeroed
}
In the allocSpan method, if the area to be allocated is not large and the physical alignment is not considered, the space will be obtained from the pageCache cache of the logical processor first. is space for time).
The following 16 lines can see that the corresponding space is tried to be obtained from the pcache of the logical processor P first.
func (h *mheap) allocSpan(npages uintptr, typ spanAllocType, spanclass spanClass) (s *mspan) {
// Function-global state.
gp := getg()
base, scav := uintptr(0), uintptr(0)
// On some platforms we need to provide physical page aligned stack
// allocations. Where the page size is less than the physical page
// size, we already manage to do this by default.
needPhysPageAlign := physPageAlignedStacks && typ == spanAllocStack && pageSize < physPageSize
// If the allocation is small enough, try the page cache!
// The page cache does not support aligned allocations, so we cannot use
// it if we need to provide a physical page aligned stack allocation.
pp := gp.m.p.ptr()
if !needPhysPageAlign && pp != nil && npages < pageCachePages/4 {
c := &pp.pcache
// If the cache is empty, refill it.
if c.empty() {
lock(&h.lock)
*c = h.pages.allocToCache()
unlock(&h.lock)
}
// Try to allocate from the cache.
base, scav = c.alloc(npages)
if base != 0 {
s = h.tryAllocMSpan()
if s != nil {
goto HaveSpan
}
// We have a base but no mspan, so we need
// to lock the heap.
}
}

The structure of pageCache is as follows:
The code is in runtime/mpagecache.go
// 代表pageCache能够使用的空间数,8x64一共是512kb空间
const pageCachePages = 8 * unsafe.Sizeof(pageCache{}.cache)
// pageCache represents a per-p cache of pages the allocator can
// allocate from without a lock. More specifically, it represents
// a pageCachePages*pageSize chunk of memory with 0 or more free
// pages in it.
type pageCache struct {
base uintptr // base代表该虚拟内存的基线地址
// cache和scav都是起到位图标记的作用,cache主要是标记哪些内存位置已经被使用了,scav标记已经被清除的区域
// 用来加速垃圾未收,在垃圾回收一定条件下两个可以互换,提升分配和垃圾回收效率。
cache uint64 // 64-bit bitmap representing free pages (1 means free)
scav uint64 // 64-bit bitmap representing scavenged pages (1 means scavenged)
}
Let’s go back to the allocSpan method of mheap
radix tree
If the pageCache does not meet the allocation conditions or there is no free space, the mheap is globally locked to obtain memory
// For one reason or another, we couldn't get the
// whole job done without the heap lock.
lock(&h.lock)
.................
if base == 0 {
// Try to acquire a base address.
base, scav = h.pages.alloc(npages)
if base == 0 {
if !h.grow(npages) {
unlock(&h.lock)
return nil
}
base, scav = h.pages.alloc(npages)
if base == 0 {
throw("grew heap, but no adequate free space found")
}
}
}
................
unlock(&h.lock)
Here, it is first obtained from the pages of mheap. This pages is a structure instance of pageAlloc, which is managed in the form of a radix tree. There are at most 5 layers, and each node corresponds to a pallocSum object. Except for the leaf node, each node contains the memory information of 8 consecutive child nodes. The higher the node contains, the more memory information. Represents 16G memory space.There are also some search optimizations
Then when there is no space in mheap, it will apply to the operating system. This part of the code will call the grow and sysGrow methods of pageAlloc in the grow function of mheap, and internally will call the platform-related sysUsed method to apply for memory to the operating system.
Another point to note in mheap is the management of mcentral
//path: /usr/local/go/src/runtime/mheap.go
type mheap struct {
lock mutex
// spans: 指向mspans区域,用于映射mspan和page的关系
spans []*mspan
// 指向bitmap首地址,bitmap是从高地址向低地址增长的
bitmap uintptr
// 指示arena区首地址
arena_start uintptr
// 指示arena区已使用地址位置
arena_used uintptr
// 指示arena区末地址
arena_end uintptr
central [67*2]struct {
mcentral mcentral
pad [sys.CacheLineSize - unsafe.Sizeof(mcentral{})%sys.CacheLineSize]byte
}
}
First of all, notice the sys.CacheLineSize here, and according to this, do spare alignment on mcentral to prevent performance problems caused by the CPU’s pseudo-shared cache (for pseudo-shared cache recommendations, see my article:
https://www.cnblogs.com/dojo-lzz/p/16183006.html).
Secondly, it should be noted that the number of mcentrals here is 67×2=134, which is also processed separately for objects with pointers and objects without pointers to improve the efficiency of garbage collection and thus improve the overall performance.
Use this picture to see it more clearly
To sum up, refined memory management and performance assurance are achieved through detailed object division, extreme multi-level cache + lock-free policy cache, and precise bitmap management.
The whole article takes about a month. By looking at the source code, you can find that the existing information on Go memory allocation is either outdated, or echoing what others say, or independent thinking and practice can best reveal the essence.