当前位置：首页 > article >正文

Golang | 每日一练 (3)

article 2025/2/22 19:15:56

💢欢迎来到张胤尘的技术站
💥技术如江河，汇聚众志成。代码似星辰，照亮行征程。开源精神长，传承永不忘。携手共前行，未来更辉煌💥

文章目录

Golang | 每日一练 (3)
- 题目
- 参考答案
- - `map` 实现原理
  - - `hmap`
    - `bmap`
    - 数据存储模型
    - 键值底层访问
    - - 竞态检测
      - `Sanitizer` 检测
      - 空检查
      - 并发写检查
      - 哈希值计算
      - 桶定位
      - 扩容处理
      - 桶内查找
  - 安全并发访问 `map`
  - - 使用 `sync.Mutex` 或者 `sync.RWMutex`
    - 并发安全的 `sync.Map`

Golang | 每日一练 (3)

题目

golang 中的 map 是如何实现的？如何安全地并发访问 map？

参考答案

`map` 实现原理

golang 中，map 是一种灵活且高效的数据结构，底层是基于哈希表实现，键值对存储数据。

在 map 的内部由两个核心结构体组成：hmap 和 bmap 。

`hmap`

hmap 是 golang 中 map 的底层核心数据结构之一，它封装了哈希表的所有关键信息。源码如下所示：

源码位置：src/runtime/map.go

// A header for a Go map.
type hmap struct {
	// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
	// Make sure this stays in sync with the compiler's definition.
	count     int // # live cells == size of map.  Must be first (used by len() builtin)
	flags     uint8
	B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
	noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
	hash0     uint32 // hash seed

	buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
	oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
	nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

	extra *mapextra // optional fields
}

count：表示当前 map 中存储的键值对数量，len() 直接访问它。
flags：用于存储一些标志位，例如是否正在扩容、是否需要清理等。
B：表示桶的数量为 2^B。例如，如果 B = 3，则桶的数量为 2^3 = 8。B 的值决定了 buckets 数组的大小。
noverflow：表示溢出桶的数量。当一个桶满了（最多存储 8 对键值对），会创建一个溢出桶，并通过链表连接。
hash0：哈希种子，用于计算键的哈希值。通过哈希种子可以避免哈希冲突，增加哈希值的随机性。
buckets：指向底层桶数组的指针，桶数组的大小为 2^B，每个桶是一个 bmap 结构体。
oldbuckets：在扩容时，旧的桶数组会存储在这里。新的桶数组大小是旧数组的两倍。扩容完成后，oldbuckets 会被释放。
nevacuate：用于记录扩容进度的计数器。表示已经迁移完成的桶数量。
extra：存储一些可选字段。

`bmap`

bmap 是存储键值对的“桶”，每个桶可以存储最多 8 对键值对。源码如下所示：

源码位置：src/runtime/map.go

// A bucket for a Go map.
type bmap struct {
	// tophash generally contains the top byte of the hash value
	// for each key in this bucket. If tophash[0] < minTopHash,
	// tophash[0] is a bucket evacuation state instead.
	tophash [abi.MapBucketCount]uint8
	// Followed by bucketCnt keys and then bucketCnt elems.
	// NOTE: packing all the keys together and then all the elems together makes the
	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
	// Followed by an overflow pointer.
}

tophash：abi.MapBucketCount 是一个常量，值为 8。存储每个键的哈希值的最高字节（top byte）。特殊情况：如果 tophash[0] < minTopHash，则表示该桶处于迁移状态，而不是存储键的哈希值。

// Maximum number of key/elem pairs a bucket can hold.
MapBucketCountBits = 3 // log2 of number of elements in a bucket.
MapBucketCount     = 1 << MapBucketCountBits

minTopHash     = 5 // minimum tophash for a normal filled cell.

在 tophash 之后，bmap 会依次存储键和值。键和值的存储方式是连续的：先存储所有键（bucketCnt 个键），然后存储所有值（bucketCnt 个值）。这种设计可以减少内存对齐时的填充，从而节省内存。例如，在 map[int64]int8 的情况下，如果键和值交替存储，可能会因为对齐问题浪费内存。
最后在每个 bmap 的末尾包含一个指向下一个溢出桶的指针（overflow）。当一个桶满了（最多存储 8 对键值对）时，会通过链表的方式创建一个溢出桶，用于存储额外的键值对。

数据存储模型

例如：在64位平台上有一个 map[int]string，键的大小为 8 字节（int64），值的大小为 16 字节（指向字符串数据的指针（8 字节）和字符串的长度（8 字节））。一个 bmap 的内存布局如下所示：

在这里插入图片描述

键值底层访问

由 map 数据存储模型可知，因为键和值的存储是动态的，访问键和值时就需要通过指针偏移来实现。源码内容如下所示：

源码位置：src/runtime/map.go

// mapaccess1 returns a pointer to h[key].  Never returns nil, instead
// it will return a reference to the zero object for the elem type if
// the key is not in the map.
// NOTE: The returned pointer may keep the whole map live, so don't
// hold onto it for very long.
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    // ...
}

t *maptype：map 的类型信息，例如键的类型、值的类型、哈希函数等。
h *hmap： map 的底层结构，包含哈希表的元数据（如桶数组、键值对数量等）。
key unsafe.Pointer：查找目标键的指针。
返回值：返回指向值的指针。如果键不存在，则返回值类型的零值的指针。

对于 mapaccess1 源码的流程进行了如下步骤的拆解：

竞态检测
Sanitizer 检测
空检查
并发写检查
哈希值计算
桶定位
扩容处理
桶查找

下面针对每一步骤进行详细说明。

竞态检测

如果启用了竞态检测，记录当前操作的上下文，以便在发生竞态时能够追踪。

if raceenabled && h != nil {
    callerpc := getcallerpc()
    pc := abi.FuncPCABIInternal(mapaccess1)
    racereadpc(unsafe.Pointer(h), callerpc, pc)
    raceReadObjectPC(t.Key, key, callerpc, pc)
}

`Sanitizer` 检测

msanread 会标记对 key 的读取操作，确保该内存区域已经被正确初始化。防止程序读取未初始化的内存，从而避免潜在的未定义行为。

asanread 会检查对 key 的读取操作是否安全，例如是否越界或是否访问了已释放的内存。

if msanenabled && h != nil {
    msanread(key, t.Key.Size_)
}
if asanenabled && h != nil {
    asanread(key, t.Key.Size_)
}

空检查

如果 map 为空，直接返回值类型的零值。

if h == nil || h.count == 0 {
    if err := mapKeyError(t, key); err != nil {
        panic(err) // see issue 23734
    }
    return unsafe.Pointer(&zeroVal[0])
}

并发写检查

如果 map 正在被写入（hashWriting 标志被设置），抛出并发读写错误，这一点也表明 map 并不支持并发安全。

if h.flags&hashWriting != 0 {
    fatal("concurrent map read and map write")
}

哈希值计算

使用键的哈希函数计算哈希值，其中 h.hash0 是哈希种子，用于避免哈希冲突。

hash := t.Hasher(key, uintptr(h.hash0))

桶定位

代码中使用哈希值的低 B 位（bucketMask(h.B)）定位到初始桶。其中 BucketSize 是每个桶的大小（包括键和值的存储）。

m := bucketMask(h.B)
b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.BucketSize)))

扩容处理

如果当前真正在扩容，那么检查旧的桶数组 oldbuckets。如果旧桶尚未迁移完成（!evacuated(oldb)），使用旧桶进行查找。

if c := h.oldbuckets; c != nil {
    if !h.sameSizeGrow() {
        // There used to be half as many buckets; mask down one more power of two.
        // 如果扩容前的桶数量是当前的一半，进一步掩码哈希值
        m >>= 1
    }
    oldb := (*bmap)(add(c, (hash&m)*uintptr(t.BucketSize)))
    if !evacuated(oldb) {
        b = oldb
    }
}

桶内查找

// 获取键的哈希值的最高字节
top := tophash(hash)
bucketloop:
	// 遍历当前桶及其溢出桶
	for ; b != nil; b = b.overflow(t) {
        // 遍历桶内的所有键值对
		for i := uintptr(0); i < abi.MapBucketCount; i++ {
            // 检查当前个键的哈希值最高字节是否匹配，如果不相等，说明当前槽位的键与目标键不匹配
			if b.tophash[i] != top {
                // 如果遇到空槽（emptyRest）跳出桶循环，因为后续槽位都是空的
				if b.tophash[i] == emptyRest {
					break bucketloop
				}
                // 不匹配则跳过当前槽位
				continue
			}
            
           	// 计算第 i 个键的地址
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))
            // 如果键是指针类型，解引用键的指针
			if t.IndirectKey() {
				k = *((*unsafe.Pointer)(k))
			}
            
            // 比较传入的键和当前键是否相等
			if t.Key.Equal(key, k) {
                // 计算第 i 个值的地址
				e := add(unsafe.Pointer(b), dataOffset+abi.MapBucketCount*uintptr(t.KeySize)+i*uintptr(t.ValueSize))
				// 如果值是指针类型，解引用值的指针
                if t.IndirectElem() {
					e = *((*unsafe.Pointer)(e))
				}
                // 返回值的指针
				return e
			}
		}
	}
	
	// 如果未找到，则返回值类型的零值指针
	return unsafe.Pointer(&zeroVal[0])

根据上一小结中的 bmap 数据存储模型可知，桶内查找中最为重要的就是键的偏移量计算和值的偏移量计算，如下所示：

k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))

其中，dataOffset 是 tophash 数组之后的偏移量，i * uintptr(t.KeySize) 是计算得第 i 个键的偏移量。

e := add(unsafe.Pointer(b), dataOffset+abi.MapBucketCount*uintptr(t.KeySize)+i*uintptr(t.ValueSize))

其中，dataOffset + abi.MapBucketCount*uintptr(t.KeySize) 是值的起始偏移量（所有键之后）；i * uintptr(t.ValueSize) 是第 i 个值的偏移量。

安全并发访问 `map`

由前一节可知， map 并非并发安全的，直接在多个 goroutine 中并发的修改同一个 map 时，会导致数据竞争和不可预测的行为。

为了安全地并发访问 map，可以采用以下几种方法：

使用 `sync.Mutex` 或者 `sync.RWMutex`

代码如下所示：

package main

import (
	"fmt"
	"sync"
)

func main() {
	var mu sync.RWMutex
	myMap := make(map[int]int)

	var wg sync.WaitGroup
	for i := 0; i < 100; i++ { // 启动100个协程
		wg.Add(1)
		go func(val int) {
			defer wg.Done()
			mu.Lock() // 加写锁
			myMap[val] = val * 2
			mu.Unlock() // 释放写锁
		}(i)
	}

	wg.Wait() // 等待协程结束

	mu.RLock() // 加读锁
	fmt.Println("Map content:", myMap)
	mu.RUnlock() // 释放读锁
}

在读取时使用 mu.RLock() 和 mu.RUnlock()，在写入时使用 mu.Lock() 和 mu.Unlock()。另外 sync.RWMutex 允许多个读操作并发执行，但写操作互斥。

并发安全的 `sync.Map`

代码如下所示：

package main

import (
	"fmt"
	"sync"
)

func main() {
	var myMap sync.Map

	var wg sync.WaitGroup
	for i := 0; i < 10; i++ {
		wg.Add(1)
		go func(val int) {
			defer wg.Done()
			myMap.Store(val, val*2) // 存储键值对
		}(i)
	}

	wg.Wait()

	// Key: 0, Value: 0
	// Key: 2, Value: 4
	// Key: 3, Value: 6
	// Key: 7, Value: 14
	// Key: 8, Value: 16
	// Key: 9, Value: 18
	// Key: 1, Value: 2
	// Key: 4, Value: 8
	// Key: 5, Value: 10
	// Key: 6, Value: 12
	myMap.Range(func(key, value interface{}) bool { // 遍历所有键值对
		fmt.Printf("Key: %v, Value: %v\n", key, value)
		return true
	})
}