当前位置：首页 > article >正文

第21课-C++[set和map学习和使用]

article 2025/2/22 2:24:42

🌼引言

C++ 标准模板库（STL）中的 set 和 map 是两种非常实用的关联式容器。它们具备快速查找、有序存储的特点，因而在很多需要高效数据管理的场景中被广泛应用。本文将深入讲解 set 和 map 的用法，并通过实际例子分析如何将它们应用到编程问题中。

一、关联式容器的基础知识

1.1 什么是关联式容器？

在 STL 中，容器分为两大类：序列式容器和关联式容器。

序列式容器：包括 vector、deque、list 等，它们的特点是按元素插入的顺序存储数据，并提供快速的随机访问能力。然而，在这些容器中查找某个特定值的效率较低（线性时间复杂度 O(n)），不适合频繁查找的情况。
关联式容器：包括 set、map、multiset、multimap 等，它们采用键值对（key-value）的形式存储数据，可以通过键快速找到相应的值。这类容器的查找、插入和删除操作的时间复杂度较低（O(log n)），且存储的元素自动按键排序。

关联式容器的特点如下：

高效查找：set 和 map 可以在 O(log n) 的时间内查找、插入或删除数据。
自动排序：存储的元素按键自动排序，默认是升序，可以通过自定义比较函数实现降序排序。
键值对存储：对于 map 和 multimap，每个元素都以键值对的形式存储，即每个键对应一个值。

1.2 键值对概念

map 和 multimap 是基于键值对（key-value pair）结构的关联式容器。键值对是一种用于表示一一对应关系的数据结构。在键值对中，key 表示键，value 表示值。通过 key 可以快速定位到 value，这种设计非常适合用于数据查找和映射。

1.在 C++ 中，标准库提供了 pair 结构用于实现键值对。例如：

pair<string, int> p("Alice", 90); 
cout << "Name: " << p.first << ", Score: " << p.second << endl;

pair 中的 first 表示键值，second 则表示实值，在给关联式容器中插入数据时，可以构建 pair 对象。

2.利用函数实现键值对：

make_pair 是一个方便的工具函数，用于创建 pair 对象。通过它，你可以简洁地创建键值对、返回多个值以及处理成对的数据。在 STL 容器（如 map 和 multimap）中使用 make_pair，不仅使代码简洁，而且提高了代码的可读性。

#include <iostream>
#include <vector>
#include <algorithm>
#include <utility>

int main() {
    std::vector<std::pair<int, std::string>> items;

    // 使用 make_pair 添加元素
    items.push_back(std::make_pair(3, "Apple"));
    items.push_back(std::make_pair(1, "Banana"));
    items.push_back(std::make_pair(2, "Cherry"));

    // 按第一个元素排序
    std::sort(items.begin(), items.end());

    for (const auto& item : items) {
        std::cout << item.second << " ";
    }
    std::cout << std::endl;

    return 0;
}

1.3、树型结构的关联式容器

所以在 C++ 标准中，共提供了四种树型结构的关联式容器

set

multiset

map

multimap

关于哈希结构的关联式容器将在哈希表中学习

树型结构与哈希结构的关联式容器功能都是一模一样的，不过哈希结构查找比树型结构快得多 -> O(1)

注： STL 中选择的树型结构为红黑树 RB-Tree 树型结构中的元素中序遍历后有序，而哈希结构中的元素无序

二、`set` 容器详解

set 是 C++ STL 中的一个集合容器，用于存储不重复的元素，且所有元素是有序存储的。它适合需要唯一性和快速查找的场景。

2.1 `set` 的特点

排序+去重

唯一性：set 中的每个元素都是唯一的，不能重复，重复元素不插入。
自动排序：set 会自动对元素进行排序，默认情况下按升序排列。
无键值对：set 只存储键值，数据即键值，不存在 key-value 对。

2.2 `set` 的常用操作

操作	作用
`insert`	插入一个元素
`erase`	删除一个元素
`find`	查找一个元素，返回其迭代器
`count`	统计元素个数，`set` 中最多为 1
`empty`	判断 `set` 是否为空
`size`	返回 `set` 中元素的个数
`clear`	清空 `set` 中的所有元素
`begin`、`end`	返回迭代器，支持遍历

代码演示：

#include <iostream>
#include <vector>
#include <set>
using namespace std;

int main()
{
	vector<int> arr = { 7,3,6,9,3,1,6,2 };
	set<int> s1(arr.begin(), arr.end());

	//迭代器遍历
	cout << "迭代器遍历结果: ";
	set<int>::iterator it = s1.begin();
	while (it != s1.end())
	{
		cout << *it << " ";
		++it;
	}
	cout << endl;

	//判空、求大小
	cout << "===================" << endl;
	cout << "empty(): " << s1.empty() << endl;
	cout << "size(): " << s1.size() << endl;
	cout << "max_size(): " << s1.max_size() << endl;

	//插入元素
	cout << "===================" << endl;
	cout << "insert(5): ";
	s1.insert(5);
	for (auto e : s1) cout << e << " ";
	cout << endl;

	//删除元素
	cout << "===================" << endl;
	cout << "erase(6): ";
	s1.erase(6);
	for (auto e : s1) cout << e << " ";
	cout << endl;

	//交换、查找、清理
	cout << "===================" << endl;
	set<int> s2(arr.begin() + 5, arr.end());
	s1.swap(s2);
	cout << "s1: ";
	for (auto e : s1) cout << e << " ";
	cout << endl;

	cout << "s2: ";
	for (auto e : s2) cout << e << " ";
	cout << endl;

	cout << "s1.find(9): ";
	cout << (s1.find(9) != s1.end()) << endl;

	cout << "s2.clear(): " << endl;
	s2.clear();

	cout << "s1: ";
	for (auto e : s1) cout << e << " ";
	cout << endl;

	cout << "s2: ";
	for (auto e : s2) cout << e << " ";
	cout << endl;

	return 0;
}

至于 count 也可以用来查找元素是否存在，对于 set 来说，键值 key 就是实值 value，并且因为不允许冗余，所以只有一个键值，count 统计键值数量就相当于查找。

#include <iostream>
#include <vector>
#include <set>
using namespace std;


int main()
{
	vector<int> arr = { 7,3,6,9,3,1,6,2 };
	set<int> s1(arr.begin(), arr.end());

	for (int i = 0; i < 10; i++)
	{
		if (s1.count(i))
			cout << i << " 在 set 中" << endl;
		else
			cout << i << " 不在 set 中" << endl;
	}

	return 0;
}

2.3 `set` 的基本用法

示例：创建和插入数据

以下是一个创建 set 的例子，并展示如何插入和删除数据。

#include <iostream>
#include <set>
using namespace std;

int main() {
    set<int> s;  // 创建一个空的 set

    // 插入元素
    s.insert(10);
    s.insert(20);
    s.insert(10);  // 重复插入 10，不会被插入

    // 输出 set 中的元素
    for (int e : s) {
        cout << e << " ";
    }
    cout << endl;

    return 0;
}

示例：查找元素

find 可以用于查找元素是否存在于 set 中。如果存在，返回该元素的迭代器；否则，返回 end 迭代器。

int main() {
    set<int> s = {1, 2, 3, 4, 5};

    // 查找元素 3
    auto it = s.find(3);
    if (it != s.end()) {
        cout << "Element 3 found." << endl;
    } else {
        cout << "Element 3 not found." << endl;
    }

    return 0;
}

set的特点：

2.4 `multiset` 容器

multiset 是 set 的一个变体，multiset 和 set 的操作没什么区别，一模一样，允许存储重复的元素。multiset 的操作与 set 基本相同，只是它允许插入重复元素。示例如下：

#include <iostream>
#include <set>
using namespace std;

int main() {
    multiset<int> ms;
    ms.insert(5);
    ms.insert(5);
    ms.insert(3);

    for (int e : ms) {
        cout << e << " ";
    }
    cout << endl;

    return 0;
}

值得一提的是，当在 multiset 中查找冗余的数据时，返回的是 中序遍历中，第一次出现的元素。

#include <iostream>
#include <vector>
#include <set>
using namespace std;

int main()
{
	vector<int> arr = { 3,5,3,4,5,9,2,3 };
	multiset<int> ms1(arr.begin(), arr.end());

	auto it = ms1.begin();
	while (it != ms1.end())
	{
		cout << *it << " | " << &*(it) << endl;
		++it;
	}

	cout << "================" << endl;

	cout << "ms1.find(3): " << &*(ms1.find(3)) << endl;

	return 0;
}

所以，multiset 才是真正的排序，set 则是去重 + 排序

统计键值数 count 在 multiset 中可以发挥真正效果。

#include <iostream>
#include <vector>
#include <set>
using namespace std;


int main()
{
	vector<int> arr = { 3,5,3,4,5,9,2,3 };
	multiset<int> ms1(arr.begin(), arr.end());

	for (int i = 0; i < 10; i++)
		cout << i << "在 multiset 中的数量: " << ms1.count(i) << endl;

	return 0;
}

三、`map` 容器详解

map 是 C++ STL 中的一种键值对关联式容器，每个元素包含一个键和值。map 中的键是唯一的，不能重复；每个键都映射到一个对应的值。map 在很多场景中被用作哈希表来存储键值对数据。

3.1 `map` 的特点

唯一键：map 中的每个键是唯一的。
键值对存储：map 存储的是键值对，即 key-value。
按键自动排序：map 会自动对键进行排序，默认按升序排列。
可以通过键访问值：可以通过 operator[] 运算符直接访问键对应的值。

3.2 `map` 的常用操作

操作	作用
`insert`	插入一个键值对
`erase`	删除指定键的键值对
`find`	查找指定键，返回其迭代器
`count`	统计键的数量（`map` 中最多为 1）
`operator[]`	根据键访问值，如果键不存在则自动插入新键
`empty`	判断 `map` 是否为空
`size`	返回 `map` 中键值对的个数
`clear`	清空 `map` 中的所有键值对

map 的定义和常用操作

map 位于 <map> 头文件中，定义方式如下：

#include <map>
#include <iostream>
using namespace std;

int main() {
    map<string, int> age; // 创建一个 map，其中键是 string，值是 int

    // 插入键值对
    age["Alice"] = 25;
    age["Bob"] = 30;

    // 查找键值对
    if (age.find("Alice") != age.end()) {
        cout << "Alice's age: " << age["Alice"] << endl;
    }

    // 遍历 map
    for (const auto& entry : age) {
        cout << entry.first << ": " << entry.second << endl;
    }

    return 0;
}

map 的基本用法示例

1. 插入和访问键值对

#include <map>
#include <iostream>
using namespace std;

int main() {
    map<string, int> scores;

    // 插入键值对
    scores.insert(make_pair("Alice", 90));
    scores.insert(make_pair("Bob", 85));
    
    // 也可以通过 operator[] 插入和访问
    scores["Charlie"] = 95;

    // 访问值
    cout << "Alice's score: " << scores["Alice"] << endl;

    return 0;
}

2. 查找和删除键值对

#include <map>
#include <iostream>
using namespace std;

int main() {
    map<string, int> scores = {{"Alice", 90}, {"Bob", 85}, {"Charlie", 95}};

    // 查找键 "Bob"
    auto it = scores.find("Bob");
    if (it != scores.end()) {
        cout << "Found Bob's score: " << it->second << endl;
    }

    // 删除键 "Alice"
    scores.erase("Alice");

    // 输出所有键值对
    for (const auto& entry : scores) {
        cout << entry.first << ": " << entry.second << endl;
    }

    return 0;
}

`3.3 map` 的特点与性能

自动排序：map 中的元素按键自动排序，因此每次插入键值对后，map 都会根据键进行重新排序。
唯一性：map 中不允许重复键，如果试图插入相同键的元素，将会被忽略。
高效查找：map 的查找、插入和删除的时间复杂度为 O(log n)，适合需要快速查找的场景。
默认初始化：使用 operator[] 时，如果键不存在，则会创建一个默认值的键值对。

注意事项

map[key] 如果访问的键不存在，则会自动创建该键，并给其赋一个默认值（如 int 的默认值是 0）。使用 find 可以避免这种不必要的插入。
map 中的数据是按键有序的，因此 map 不支持像 unordered_map 那样的 O(1) 查找。

map 的应用场景

词频统计：可以使用 map 来统计字符串中每个字符的出现频率。
缓存机制：可以通过 map 来实现简单的缓存机制，用于存储经常使用的键值对。
字典实现：map 是存储字典（如配置文件、键值对映射等）的一种常用数据结构。

总计：

总结

map 是一个非常强大的数据结构，可以存储唯一的键值对并按键排序。它在很多场景中都非常实用，尤其适合用于需要快速查找和唯一键的情况。在使用时需要注意 map[key] 会默认初始化键值对的问题，可以通过 find 函数来避免不必要的插入。

四、前K个高频单词

692. 前K个高频单词 - 力扣（LeetCode）

题目分析：题目很短，就是在一个字符串数组中，找出前 k 个出现频率最高的单词

注意：如果出现次数相同，则按字典序排序。（相同次数小写排前面）。

解法：map + 快排

利用 map 建立 <string, int> 的映射关系，在按照字典序排序的同时统计出每个单词的出现频率，再通过快排依照数量进行二次排序，选择前 k 个高频单词即可

因为基础版快排不稳定，可能会导致频率相同的单词顺序出问题，即违背题目要求：如果出现频率相同，则按字典序排序

所以这里需要使用稳定版快排 stable_sort，如果频率相同，保持原有顺序

//map + stable_sort
class Solution {
public:
    struct Compare
    {
        bool operator()(const pair<string, int>& kv1, const pair<string, int>& kv2) const
        {
            return kv1.second > kv2.second;
        }
    };

    vector<string> topKFrequent(vector<string>& words, int k) {
        //统计每个单词出现的频率，同时先按照字典序排序
        map<string, int> table;
        for(auto e : words)
            table[e]++;

        //将处理好的数据存入数组中
        vector<pair<string, int>> vTable(table.begin(), table.end());

        //按照出现频率进行二次排序
        stable_sort(vTable.begin(), vTable.end(), Compare());

        //取出前 k 个高频单词
        vector<string> vs;
        for(int i = 0; i < k; i++)
            vs.push_back(vTable[i].first);
        
        return vs;
    }
};

LCR 154. 复杂链表的复制 - 力扣（LeetCode）

五、补充：交集与差集

交集，指两个数组中相同的元素所构成的集合

求交集的步骤如下：

先将两个数组排序 + 去重
遍历两个数组
如果不相等，小的 ++
相等就是交集，记录下来
其中一方走完，所有交集就查找完了

排序 + 去重，这就不就是 set 吗？

349. 两个数组的交集 - 力扣（LeetCode）

class Solution {
public:
    vector<int> intersection(vector<int>& nums1, vector<int>& nums2) {
      set<int> set1(nums1.begin(),nums1.end());
       set<int> set2(nums2.begin(),nums2.end());

       
       vector<int> num1(set1.begin(),set1.end());
       vector<int> num2(set2.begin(),set2.end());
        int i=0,j=0;
        vector<int> v;

        while(i<num1.size()&&j<num2.size())
        {
            if(num1[i]==num2[j])
            {
                v.push_back(num1[i]);
            }
            else if(num1[i]<num2[j])
            {
                i++;
            }
            else{
                j++;
            }

        }
        return v;
    }
};

总结：

容器	特点	适用场景
`set`	唯一元素，有序存储	去重，查找唯一元素
`multiset`	允许重复元素，有序存储	频率统计，处理重复数据
`map`	键值对存储，唯一键，有序	频率统计，字典查询
`multimap`	键值对存储，允许重复键，有序	频率统计，管理多值的数据