当前位置：首页 > article >正文

learn C++ NO.25——unordered_set与unordered_map的封装

article 2025/2/28 19:58:45

前言

在上一篇博客中，对开散列哈希与闭散列哈希进行了模拟实现，这篇文章将以上篇文章的哈希桶为容器，对unordered_set与unordered_map进行封装。有了封装map和set的经验，搞一搞这个更加复杂的封装。

unordered_set与unordered_map的封装

定义好模板类结构

第一步，把对应的文件.h文件新建好。包含一下我们上篇文章写的hashTable.h。然后，将两个类模板定义出来。
在这里插入图片描述
将哈希桶的代码进行一些调整，将节点的模板参数部分进行调整，让它多走一层泛型，无论是unordered_map还是unordered_set，他会根据T这个模板是实例化对应的类型。unordered_map就是K V模型的pair，unordered_set就是key。

因为unordered_map的第二个模板参数传递的是一个pair，哈希桶中的insert和find进行key的匹配。所以需要加上一个类模板KeyOfT来获取pair里的Key。这里的unordered_set也得被迫提供一个仿函数来传参。套上KeyOfT模板后，还需要将节点的模板调整一下，然后将insert() 和 find() 的 key的比较逻辑进行一下修改，对于unordered_map的pair中的key取出来进行比较。

在这里插入图片描述
下面，修改一下哈希桶的insert函数。将形参修改成数据模板T，并引入模板KeyOfT来取map的pair中的Key。

迭代器的封装

下面实现模拟实现一份迭代器，让两个容器能用迭代器遍历。现将迭代器的模板类设计一下，需要一个成员变量存当前迭代器遍历的节点。由于需要遍历哈希表，所以要有一个成员变量哈希表的表指针。那就引申出一个问题。哈希桶中有迭代器，迭代器中有哈希桶，但是迭代器由于代码实现在哈希桶的上面。编译时，编译器从上往下编译。可能不认识哈希桶，这时候就需要提前声明一下哈希桶。在这里插入图片描述
由于需要在迭代器内部访问哈希桶的私有成员，这里有两种处理方式，一种是对外提供一个get接口访问哈希桶的vector对象。另一种是在哈希桶内声明HTIterator为友元模板类。这里我就以友元的方式处理。然后将begin()和end()接口提供一下，在HTIterator中设计一个哈希桶的指针为成员变量，就是为了在调用begin()或end()时，用哈希桶的this指针作为参数传递构造迭代器。
在这里插入图片描述
哈希桶仅仅支持单项迭代器。迭代器部分重点说一下operator++ 的实现。当前迭代器指向的节点如果存在，分两种情况。情况一是当前节点的下一跳地址不为空。情况二是当前的节点的下一跳为空或当前节点为空。情况一的处理逻辑就是将迭代器指向的下一跳节点，并返回当前迭代器。情况二就需要先计算哈希值，然后++哈希值，随后从新的哈希值开始遍历整个桶，直到碰到一个节点。然后返回一个指向这个节点的迭代器。如果没有找到节点，就返回nullptr。
在这里插入图片描述

适配出const迭代器

通过封装map和set的经验，其实这里的迭代器还是一个残血版的迭代器。下面我在增加两个模板参数 Ptr 和 Ref，适配出const迭代器，并解决当前迭代器可以修改key的问题。这里的迭代器的实现还有一个关于const对象权限放大的问题需要解决。因为，迭代器封装的时候还要有一个哈希桶对象指针用于遍历哈希桶。所以，需要解决const迭代器权限放大的问题。
在这里插入图片描述

关于将迭代器内的哈希桶成员改为const 成员的原因如下，因为const迭代器的begin() 和 end() 都用了const来修饰this指针。如果迭代器内的哈希桶成员为普通成员，那么势必导致权限的放大，编译器就会直接报错。若修改为const 对象，const对象可以调用，普通对象也可以调用，因为权限可以缩小。

unordered_set由于不支持修改，所以只需要提供const迭代器即可，大家的使用习惯还是倾向于用普通迭代器去遍历容器，用const迭代器typedef一个普通迭代器对外提供即可。

为了避免unordered_map的Key被修改，需要类型实例化哈希桶的第二个模板参数T时，将pair的第一个类型用const修饰一下，这样就避免了Key可以被修改了。

unordered_map的operator[]的重载

首先，我们需要修改一下insert的返回值。库里面是提供了返回值为pair<iterator , bool>的insert接口。不仅要修改insert接口，还要将find接口的返回值改成迭代器。
在这里插入图片描述

然后调整一下insert接口的返回值。

在这里插入图片描述
此时，unordered_set的插入接口就没办法正常跑通了。因为unordered_set只有const迭代器，而哈希桶的insert返回值pair里面的迭代器是一个普通迭代器。相应的类型就不能匹配。在set封装的文章中，为了解决这一问题，需要迭代器提供一个普通迭代器对象构造const迭代器对象的特殊构造函数。这样就能解决insert返回值对于unordered_set的迭代器不匹配的问题。

在这里插入图片描述
解决这一问题后，接下来就是实现unordered_map的operator[]。实现思路就是通过复用insert来实现插入新数据或修改Val。

总结

unordered_set与unordered_map的封装解决了如下问题，解决了同一份哈希桶内实例化成两份不同的代码。还解决了迭代器operator++的实现。以及用正向迭代器适配出const迭代器时，const版本begin() const 和 end() const 引发的权限放大问题。以及为了实现operator[] 修改insert返回值引发的unordered_set的迭代器类型不匹配的问题。通过逐步封装与解决问题，不仅仅提升了代码能力，还加深了对于哈希桶的理解。

参考代码

#pragma once
#include <iostream>
#include <vector>
#include <string>
#include <utility>


template<class K>
struct defaultHashFunc
{
	size_t operator()(const K& key)
	{
		return (size_t)key;
	}
};

template<>
struct defaultHashFunc<std::string>
{
	size_t operator()(const std::string& str)
	{
		//BKDR 算法
		size_t ret = 0;
		for (auto ch : str)
		{
			ret = ret * 131 + ch;
		}
		return ret;
		//return (size_t)str[0];
	}
};

namespace open_addr
{
	enum STATE
	{
		EXIST, // 当前位置存在值
		DELETE, //位置值已删除 
		EMPTY //当前位置为空
	};


	template<class K, class V>
	struct HashData
	{
		pair<K, V> _kv; // 数据
		STATE _state = EMPTY; // 状态
	};

	//struct stringHashFunc
	//{
	//	size_t operator()(const string& str)
	//	{
	//		return (size_t)str[0];
	//	}
	//};


	template<class K, class V, class HashFunc = defaultHashFunc<K>>
	class HashTable
	{
	public:
		HashTable()
		{
			_hashTable.resize(10);
		}

		bool insert(const pair<K, V>& kv)
		{
			//维持key的唯一性
			if (find(kv.first))
				return false;
			//扩容逻辑
			//if ((double)_n / _hashTable.size() >= 0.75)
			if (_n * 10 / _hashTable.size() >= 7)

			{
				//创建新表
				size_t newSize = _hashTable.size() * 2;
				HashTable<K, V, HashFunc> newTable;
				newTable._hashTable.resize(newSize);

				//将旧表的数据插入到新表
				for (size_t i = 0; i < _hashTable.size(); i++)
				{
					if (_hashTable[i]._state == EXIST)
						newTable.insert(_hashTable[i]._kv);//复用当前insert的线性探测逻辑
				}

				_hashTable.swap(newTable._hashTable);//类似于拷贝构造之前的现代写法
			}

			//线性探测
			HashFunc func;
			size_t hashI = func(kv.first) % _hashTable.size(); // % capacity()可能会导致[]访问的越界

			while (_hashTable[hashI]._state == EXIST)//EXIST状态表示当前位置值有效
			{
				++hashI;

				hashI %= _hashTable.size(); //当遍历到最后一个位置没法插入，得回到起始位置再匹配。
			}

			//插入值并修改状态
			_hashTable[hashI]._kv = kv;
			_hashTable[hashI]._state = EXIST;
			++_n;

			return true;
		}

		HashData<const K, V>* find(const K& key)
		{
			//线性探测
			HashFunc func;
			size_t hashi = func(key) % _hashTable.size();

			while (_hashTable[hashi]._state != EMPTY)
			{
				if (_hashTable[hashi]._state == EXIST
					&& _hashTable[hashi]._kv.first == key)
				{
					//强转保证可移植性
					return (HashData<const K, V>*) & _hashTable[hashi]._kv;
				}

				++hashi;
				hashi %= _hashTable.size();
			}

			return nullptr;
		}

		bool erase(const K& key)
		{
			HashData<const K, V>* ret = find(key);

			if (ret)
			{
				ret->_state = DELETE;
				--_n;

				return true;
			}
			return false;
		}


	private:
		vector<HashData<K, V>> _hashTable;
		size_t _n = 0; // 记录当前有效元素个数，计算负载因子
	};
}

namespace hash_bucket
{
	template<class T>
	struct hashNode
	{
		T _data;
		hashNode<T>* _next;

		hashNode(const T& data)
			:_data(data)
			, _next(nullptr)
		{}
	};

	//前置声明，避免编译报错。声明模板可以不带默认参数
	template<class K, class T, class KeyOfT, class HashFunc>
	class HashTable;

	template<class K, class T, class Ptr, class Ref, class KeyOfT, class HashFunc>
	struct HTIterator
	{
		typedef hashNode<T> Node;
		typedef HTIterator<K, T, Ptr, Ref, KeyOfT, HashFunc> Self;
		typedef HTIterator<K, T, T*, T&, KeyOfT, HashFunc> iterator;

		Node* _node;
		const HashTable<K, T, KeyOfT, HashFunc>* _ptable; // 为了兼容const迭代器，将对象变成const对象

		//HTIterator(Node* node, HashTable<K, T, KeyOfT, HashFunc>* pt)
		//	:_node(node)
		//	, _ptable(pt)
		//{}

		// 参数为普通迭代器时，就是普通迭代器的拷贝构造
		// 参数为const迭代器时，就是普通迭代器构造const迭代器的特殊构造
		HTIterator(const iterator& it)
			:_node(it._node)
			, _ptable(it._ptable)
		{}

		HTIterator(Node* node, const HashTable<K, T, KeyOfT, HashFunc>* pt)
			:_node(node)
			, _ptable(pt)
		{}

		T& operator*()
		{
			return _node->_data;
		}

		T* operator->()
		{
			return &_node->_data;
		}

		bool operator!=(const Self& s)
		{
			return _node != s._node;
		}

		bool operator==(const Self& s)
		{
			return _node == s._node;
		}

		Self& operator++()
		{
			if (_node->_next)//当前Node不处于链表尾部
			{
				_node = _node->_next;
			}
			else//处于链表尾部
			{
				HashFunc hf;
				KeyOfT kot;
				//计算哈希值
				size_t hashi = hf(kot(_node->_data)) % _ptable->_table.size();
				++hashi;

				//遍历剩余的表，看看有没有节点
				while (hashi < _ptable->_table.size())
				{
					if (_ptable->_table[hashi])
					{
						_node = _ptable->_table[hashi];
						return *this;
					}
					else
					{
						++hashi;
					}
				}

				_node = nullptr;
			}

			return *this;
		}
	};

	template<class K, class T, class KeyOfT, class HashFunc = defaultHashFunc<K>>
	class HashTable
	{
		typedef hashNode<T> Node;

		template<class K, class T, class Ptr, class Ref, class KeyOfT, class HashFunc>
		friend struct HTIterator; // 定义友缘模板，让HTIterator可以访问HashTable的私有成员

	public:
		typedef HTIterator< K, T, T*, T&, KeyOfT, HashFunc> iterator;
		typedef HTIterator< K,  T, const T*, const T&, KeyOfT, HashFunc> const_iterator; 

		iterator begin()
		{
			//找第一个节点
			for (int i = 0; i < _table.size(); i++)
			{
				Node* cur = _table[i];
				if (cur != nullptr)
				{
					return iterator(cur, this);
				}
			}
			return iterator(nullptr, this);
		}

		iterator end()
		{
			return iterator(nullptr, this);
		}

		const_iterator begin() const
		{
			//找第一个节点
			for (int i = 0; i < _table.size(); i++)
			{
				Node* cur = _table[i];
				if (cur != nullptr)
				{
					return const_iterator(cur, this);
				}
			}
			return const_iterator(nullptr, this);
		}

		const_iterator end() const
		{
			return const_iterator(nullptr, this);
		}

		HashTable()
		{
			_table.resize(10, nullptr);//给桶一个初始值
		}

		HashTable(const HashTable<K, T, KeyOfT, HashFunc>& HT)
		{
			_table.resize(HT._table.size());
			for (int i = 0; i < HT._table.size(); i++)
			{
				Node* cur = HT._table[i];

				while (cur)
				{
					Node* newn = new Node(cur->_kv);//创建新节点
					newn->_next = _table[i];//将原头节点连接到新节点的后面
					_table[i] = newn; //覆盖原头节点的位置
					++_n;

					cur = cur->_next;//迭代
				}
			}
		}


		~HashTable()
		{
			//遍历桶，依次删除节点
			for (int i = 0; i < _table.size(); i++)
			{
				Node* cur = _table[i];

				while (cur)
				{
					Node* next = cur->_next;
					delete cur;
					cur = next;
				}

				cur = nullptr;
			}
		}

		pair<iterator, bool> insert(const T& data)
		{
			HashFunc hf;
			KeyOfT kot;

			iterator it = find(kot(data));
			if (it != end())
				return make_pair(it, false);

			//扩容
			//...
			if (_n == _table.size())
			{
				size_t newSize = _table.size() * 2;// 2倍扩容
				HashTable<K, T, KeyOfT, HashFunc> newTable; //创建新表
				newTable._table.resize(newSize, nullptr);//将空间扩容后，初始化成空

				//依次将旧表节点链接到新表
				for (int i = 0; i < _table.size(); i++)
				{
					Node* cur = _table[i];

					while (cur)
					{
						Node* next = cur->_next;//记录下一跳节点
						size_t hashi = hf(kot(data)) % newSize;//映射哈希值

						cur->_next = newTable._table[hashi];//将原头节点连接到新节点的后面
						newTable._table[hashi] = cur; //覆盖原头节点的位置

						cur = next; //迭代
					}
					_table[i] = nullptr;//置空原表
				}
				_table.swap(newTable._table);//将新表跟旧表交换一下。
			}

			//插入数据
			size_t hashi = hf(kot(data)) % _table.size();

			Node* newn = new Node(data);//创建新节点
			newn->_next = _table[hashi];//将原头节点连接到新节点的后面
			_table[hashi] = newn; //覆盖原头节点的位置

			++_n;
			return make_pair(iterator(newn, this), true);
		}


		//维持key的唯一性
		iterator find(const K& key)
		{
			HashFunc hf;
			KeyOfT kot;
			//获取哈希值
			size_t hashi = hf(key) % _table.size();

			//遍历链表
			Node* cur = _table[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
				{
					return iterator(cur, this);
				}
				cur = cur->_next;
			}
			return end();
		}

		bool erase(const K& key)
		{
			HashFunc hf;
			size_t hashi = hf(key) % _table.size();

			Node* prev = nullptr;
			Node* cur = _table[hashi];

			while (cur)
			{
				if (cur->_kv.first == key && prev == nullptr) //头删
				{
					_table[hashi] = cur->_next;
					delete cur;
					cur = nullptr;
					return true;
				}
				else if (cur->_kv.first == key) //中间删除/尾删
				{
					Node* next = cur->_next;
					prev->_next = next;
					delete cur;
					cur = nullptr;
					return true;
				}

				prev = cur;
				cur = cur->_next;
			}
			return false;
		}


		void Print()
		{
			for (int i = 0; i < _table.size(); i++)
			{
				printf("[%d]", i);
				cout << "->";
				Node* cur = _table[i];
				while (cur)
				{
					cout << cur->_kv.first << " : " << cur->_kv.second << " -> ";
					cur = cur->_next;
				}
				cout << "NULL" << endl;
			}
			cout << endl;
		}


	private:
		vector<Node*> _table; //哈希桶
		size_t _n = 0; // 记录数据个数
	};
}

//UnorderedMap.h
#pragma once

#include "hashTable.h"

namespace xyx
{
	template<class K, class V>
	class Unordered_Map
	{
		struct MapKeyOfT
		{
			const K& operator()(const pair<const K, V>& kv)
			{
				return kv.first;
			}
		};
	public:
						//告诉编译器，这是类型
		typedef typename hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT>::iterator iterator;
		typedef typename hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT>::const_iterator const_iterator;

		iterator begin()
		{
			return _table.begin();
		}

		iterator end()
		{
			return _table.end();
		}

		const_iterator begin() const
		{
			return _table.begin();
		}

		const_iterator end() const
		{
			return _table.end();
		}

		pair<iterator, bool> insert(const pair<K, V>& kv)
		{
			return _table.insert(kv);
		}

		V& operator[](const K& key)
		{
			pair<iterator, bool> ret = _table.insert(make_pair(key, V()));
			return ret.first->second;
		}

	private:
		hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT> _table;
	};
}

// UnorederedSet.h
#pragma once

#include "hashTable.h"

namespace xyx
{
	template<class K>
	class Unordered_Set
	{
		struct SetKeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};
	public:
							//告诉编译器，这是类型
		typedef typename hash_bucket::HashTable<K, K, SetKeyOfT>::const_iterator iterator;
		typedef typename hash_bucket::HashTable<K, K, SetKeyOfT>::const_iterator const_iterator;

		//unordered_set 的 key 不能被修改
		const_iterator begin() const
		{
			return _table.begin();
		}

		const_iterator end() const
		{
			return _table.end();
		}

		//由于unordered_set的迭代器为const迭代器
		//所以，需要提供一个普通迭代器构造const迭代器的特殊构造函数
		pair<const_iterator, bool> insert(const K& key)
		{
			pair<typename hash_bucket::HashTable<K, K, SetKeyOfT>::iterator, bool> ret = _table.insert(key);
			return pair<const_iterator, bool>(ret.first, ret.second);
		}
	private:
		hash_bucket::HashTable<K, K, SetKeyOfT> _table;
	};
}