当前位置：首页 > article >正文

HashMap源码深度解析（JDK 1.8）

article 2025/3/1 9:36:11

1. 基本概述

HashMap是Java中最常用的集合类之一，它实现了Map接口，提供了键值对的存储方式。在JDK 1.8中，HashMap的实现采用了数组+链表+红黑树的数据结构，相比于JDK 1.7的数组+链表结构，在性能上有了显著提升。

1.1 主要特点

非线程安全
允许null键和null值
不保证元素顺序
初始容量和负载因子影响性能

2. 核心数据结构

2.1 整体结构图

2.2 数据结构定义

// 基本节点类
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;          // 哈希值
    final K key;            // 键
    V value;               // 值
    Node<K,V> next;       // 下一个节点
    
    // 构造函数
    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }
}

// 红黑树节点类
static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;   // 父节点
    TreeNode<K,V> left;     // 左子节点
    TreeNode<K,V> right;    // 右子节点
    TreeNode<K,V> prev;     // 删除时需要用到
    boolean red;            // 颜色属性
}

3. 关键属性

// 默认初始容量 - 必须是2的幂
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // 16

// 最大容量
static final int MAXIMUM_CAPACITY = 1 << 30;

// 默认负载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;

// 链表转红黑树阈值
static final int TREEIFY_THRESHOLD = 8;

// 红黑树转链表阈值
static final int UNTREEIFY_THRESHOLD = 6;

// 树化最小容量
static final int MIN_TREEIFY_CAPACITY = 64;

// 实际存储的键值对数量
transient int size;

// 结构性修改次数
transient int modCount;

// 扩容阈值
int threshold;

// 负载因子
final float loadFactor;

4. 源码实现分析

4.1 哈希算法

HashMap的哈希算法是它性能的关键。一个好的哈希算法应该能够尽可能地减少哈希冲突，使得元素分布更均匀。

JDK 1.8的哈希算法有以下特点：

使用key的hashCode()方法获取初始哈希值
将hashCode的高16位与低16位进行异或运算
这样设计可以让高位数据与低位数据进行混合，增加了随机性

static final int hash(Object key) {
    int h;
    // key为null时返回0，否则进行高低位异或运算
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

举个例子：

// 假设hashCode为: 1111 1111 1111 1111 0000 0000 0000 0000
// 右移16位后:     0000 0000 0000 0000 1111 1111 1111 1111
// 异或结果:       1111 1111 1111 1111 1111 1111 1111 1111

这样设计的原因：

在计算数组下标时，我们用 (n-1) & hash 运算。由于 n 一般比较小（默认16），所以 n-1 的二进制表示中只有低位有效：

n = 16 时：
n - 1 = 15 的二进制：0000 0000 0000 0000 0000 0000 0000 1111

如果不进行高低位异或，那么计算数组下标时只有hashCode的低位（后4位）参与运算，高位的特征就完全丢失了：

// 未进行高低位异或时：
hashCode:     1111 1111 1111 1111 0000 0000 0000 0101
n-1:         0000 0000 0000 0000 0000 0000 0000 1111
& 结果:      0000 0000 0000 0000 0000 0000 0000 0101
// 结果只由最后4位决定

进行高低位异或后，让高位也参与到最终的计算中：

// 进行高低位异或后：
hashCode:     1111 1111 1111 1111 0000 0000 0000 0101
右移16位:     0000 0000 0000 0000 1111 1111 1111 1111
异或结果:     1111 1111 1111 1111 1111 1111 1111 1010
n-1:         0000 0000 0000 0000 0000 0000 0000 1111
& 结果:      0000 0000 0000 0000 0000 0000 0000 1010
// 最终结果同时包含了高位和低位的特征

这样做的好处是：

充分利用了hashCode的所有位，让高位也参与到最终的计算中
在数组长度较小时，也能保证散列的均匀性
计算速度快，仅需一次位运算
对于不同的hashCode，能产生更好的散列效果，减少哈希冲突

4.2 确定数组索引位置

在HashMap中，数组的长度总是2的幂，这样可以通过位运算来优化取模运算：

// 计算数组索引的方法
int index = (n - 1) & hash;

// 等价于但性能更好
int index = hash % n;  // n为数组长度

// 例如：数组长度为16时
// n - 1 = 15 的二进制：0000 1111
// 任何数与15进行与运算，结果一定在0-15之间

4.3 put方法实现详解

put操作是HashMap最核心的方法之一，它的实现涉及到：

数组扩容
链表树化
红黑树插入
链表插入

下面是put方法的详细实现：

public V put(K key, V value) {
    // 调用putVal方法，hash(key)用于计算哈希值
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    
    // 步骤1：如果table为空或长度为0，先进行初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    
    // 步骤2：计算索引位置，如果该位置为空，直接新建节点
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        // 步骤3：检查第一个节点是否是要找的key
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 步骤4：如果是红黑树节点，使用红黑树的方式插入
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // 步骤5：遍历链表
        else {
            for (int binCount = 0; ; ++binCount) {
                // 到达链表末尾，插入新节点
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // 检查是否需要树化
                    if (binCount >= TREEIFY_THRESHOLD - 1)
                        treeifyBin(tab, hash);
                    break;
                }
                // 找到相同的key，跳出循环
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // 步骤6：如果找到了相同的key，更新值并返回旧值
        if (e != null) {
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            return oldValue;
        }
    }
    // 步骤7：修改次数增加，检查是否需要扩容
    ++modCount;
    if (++size > threshold)
        resize();
    return null;
}

4.4 get方法实现详解

get操作的实现相对简单，但也需要考虑红黑树和链表两种情况：

public V get(Object key) {
    Node<K,V> e;
    // 计算hash值并查找节点
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    
    // 步骤1：定位数组位置
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        
        // 步骤2：检查第一个节点
        if (first.hash == hash && 
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        
        // 步骤3：遍历后续节点
        if ((e = first.next) != null) {
            // 如果是红黑树节点，使用红黑树的查找方法
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                
            // 遍历链表查找
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

4.5 删除方法实现

remove操作需要处理节点的删除和可能的树转链表操作：

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                          boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    
    // 步骤1：定位数组位置
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        
        Node<K,V> node = null, e; K k; V v;
        
        // 步骤2：查找要删除的节点
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        
        // 步骤3：执行删除操作
        if (node != null && (!matchValue || (v = node.value) == value ||
                            (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            return node;
        }
    }
    return null;
}

5. 性能优化

5.1 红黑树优化

当链表长度超过8时，会将链表转换为红黑树，这样可以将查询时间从O(n)降低到O(log n)。这个优化主要针对哈希冲突严重的情况。

final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    // 如果数组容量小于64，优先进行扩容而不是树化
    // 这是因为当数组较小时，哈希冲突的概率会更高，扩容是更好的选择
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while ((e = e.next) != null);
        if ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}

5.2 扩容优化

JDK 1.8中的扩容操作进行了以下几个方面的优化：

巧妙的位运算设计

// 判断元素是否需要移动到新位置
if ((e.hash & oldCap) == 0) {
    // 保持在原位置
} else {
    // 移动到原位置 + oldCap
}

避免了重新计算hash值

// 1.8中通过位运算即可确定新位置
newIndex = e.hash & (newCap - 1);

// 1.7中需要重新计算
newIndex = hash(key) % newLength;

优化了节点迁移策略

// 使用两个链表分别存储需要移动和不需要移动的节点
Node<K,V> loHead = null, loTail = null;    // 原位置链表
Node<K,V> hiHead = null, hiTail = null;    // 原位置+oldCap链表

5.3 哈希算法优化

高低位混合计算

// 将高16位与低16位进行异或，增加随机性
hash = key.hashCode() ^ (key.hashCode() >>> 16)

保证数组长度为2的幂

// 计算大于等于initialCapacity的最小2的幂
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

5.4 并发优化建议

使用合适的初始容量避免频繁扩容

// 预估容量的计算
int initialCapacity = (int) (expectedSize / 0.75f) + 1;
Map<K,V> map = new HashMap<>(initialCapacity);

对于并发场景，选择合适的Map实现

// 线程安全的HashMap
Map<K,V> synchronizedMap = Collections.synchronizedMap(new HashMap<>());
// 或��用ConcurrentHashMap
Map<K,V> concurrentMap = new ConcurrentHashMap<>();

5.5 内存优化

及时清理不用的引用

// 手动清理Map
map.clear();
// 或直接设为null
map = null;

使用合适的数据结构

// 如果key是整数，可以考虑使用SparseArray(Android)
// 如果需要保持插入顺序，使用LinkedHashMap
// 如果需要排序，使用TreeMap

避免过大的初始容量

// 过大的初始容量会浪费内存
// 过小的初始容量会导致频繁扩容
// 需要在实际场景中找到平衡点

5.6 性能监控建议

监控负载因子

float currentLoad = (float)size / table.length;
if (currentLoad > loadFactor) {
    // 考虑是否需要优化初始容量
}

监控冲突率

// 统计链表/树节点的数量与数组大小的比例
float conflictRate = (float)linkedNodes / table.length;

监控扩容频率

// 记录扩容次数
private int resizeCount = 0;
// 在resize()方法中统计
resizeCount++;

这些优化措施能够显著提升HashMap的性能，但具体使用时需要根据实际场景选择合适的优化策略。过度优化可能会适得其反，建议在性能测试的基础上进行针对性优化。

6. 最佳实践

6.1 初始容量设置

// 如果确定元素数量，最好指定初始容量
Map<String, Object> map = new HashMap<>(32);

// 计算初始容量的公式
int initialCapacity = (int) (expectedSize / 0.75f + 1.0f);

6.2 避免无限扩容

// 重写hashCode方法时要注意均匀分布
@Override
public int hashCode() {
    int result = 17;
    result = 31 * result + field1.hashCode();
    result = 31 * result + field2.hashCode();
    return result;
}

6.3 线程安全考虑

// 需要线程安全时使用ConcurrentHashMap
Map<String, Object> safeMap = new ConcurrentHashMap<>();

// 或者使用Collections工具类
Map<String, Object> synchronizedMap = 
    Collections.synchronizedMap(new HashMap<>());