当前位置：首页 > article >正文

【4.2】图搜索算法-DFS和BFS解单词拆分

article 2025/1/12 12:10:25

一、题目

给定一个非空字符串 s 和一个包含非空单词的列表 wordDict，判定 s 是否可以被空格拆分

为一个或多个在字典中出现的单词。

说明：

拆分时可以重复使用字典中的单词。

你可以假设字典中没有重复的单词。

示例 1：
输入:
s = "leetcode ",
wordDict = ["leet", "code"]
输出: true
解释: 返回 true 因为 "leetcode" 可以被拆分成 "leet code"。

示例 2：
输入:
s = " applepenapple ",
wordDict = [" apple ", "pen"]
输出: true
解释: 返回 true 因为 " applepenapple " 可以被拆分成 " apple pen apple"。
注意你可以重复使用字典中的单词。

示例 3：输入:
s = " catsandog",
wo rdDi c t = [" cats ", "dog", " sand", "and", "cat"]
输出: false

二、求解思路

前面我们已经讲解过这道题，使用的是动态规划方法，具体可以参考动态规划解单词拆分。今天我们将分别使用DFS（深度优先搜索）和BFS（广度优先搜索）来解决这个问题。

题目要求将字符串拆分，并判断拆分后的子串是否都存在于给定的字典中。那么，字符串应该如何拆分呢？我们通过一个例子来详细说明，比如字符串 `"abcd"`，我们可以进行如下拆分：

- `["a", "b", "c", "d"]`
- `["a", "b", "cd"]`
- `["a", "bc", "d"]`
- `["a", "bcd"]`
- `["ab", "c", "d"]`
- `["ab", "cd"]`
- `["abc", "d"]`
- `["abcd"]`

具体拆分方式可以参考下图：

每次截取一个子串，判断它是否存在于字典中。如果该子串不存在于字典中，则继续截取更长的子串进行判断；如果存在于字典中，则递归拆分剩下的子串。这是一个递归的过程。

上述执行过程可以看作是对一棵n叉树的深度优先搜索（DFS）遍历。具体来说，每个节点代表一个子串，节点的子节点代表从该子串开始截取的更长子串。通过递归遍历这棵树，我们可以找到所有符合条件的拆分方式。所以大致代码如下：

#include <string>
#include <vector>
#include <unordered_set>

bool wordBreak(const std::string& s, const std::vector<std::string>& wordDict) {
    return dfs(s, wordDict);
}

bool dfs(const std::string& s, const std::vector<std::string>& wordDict) {
    // 最终条件，都截取完了，直接返回true
    if (s.empty()) {
        return true;
    }

    // 开始拆分字符串s
    for (size_t i = 1; i <= s.length(); i++) {
        // 截取子串
        std::string sub = s.substr(0, i);

        // 如果截取的子串不在字典中，继续截取更大的子串
        if (std::find(wordDict.begin(), wordDict.end(), sub) == wordDict.end()) {
            continue;
        }

        // 如果截取的子串在字典中，继续剩下的拆分，如果剩下的可以拆分成
        // 在字典中出现的单词，直接返回true，如果不能则继续
        // 截取更大的子串判断
        if (dfs(s.substr(i), wordDict)) {
            return true;
        }
    }

    // 如果都不能正确拆分，直接返回false
    return false;
}

在上述代码中，递归调用必须有一个终止条件，以避免无限递归。通过观察递归过程，我们可以发现，终止条件是当字符串 s 中的所有字符都被遍历完毕时。此时，说明字符串 s 可以被拆分成若干个子串，并且这些子串都存在于给定的字典中。

为了更清晰地理解这个过程，我们可以通过一个图示来说明：

因为是拆分，所以字符串截取的时候不能有重叠，那么[开始截取的位置]实际上就是上次截取位置的下一个，来看下代码

#include <string>
#include <vector>
#include <unordered_set>

bool wordBreak(const std::string& s, const std::vector<std::string>& wordDict) {
    return dfs(s, wordDict, 0);
}

// start表示的是从字符串s的哪个位置开始
bool dfs(const std::string& s, const std::vector<std::string>& wordDict, int start) {
    // 字符串中的所有字符都遍历完了，也就是到叶子节点了，说明字符串s可以拆分成
    // 在字典中出现的单词，直接返回true
    if (start == s.length()) {
        return true;
    }

    // 开始拆分字符串s
    for (int i = start + 1; i <= s.length(); i++) {
        // 截取子串
        std::string sub = s.substr(start, i - start);

        // 如果截取的子串不在字典中，继续截取更大的子串
        if (std::find(wordDict.begin(), wordDict.end(), sub) == wordDict.end()) {
            continue;
        }

        // 如果截取的子串在字典中，继续剩下的拆分，如果剩下的可以拆分成
        // 在字典中出现的单词，直接返回true，如果不能则继续
        // 截取更大的子串判断
        if (dfs(s, wordDict, i)) {
            return true;
        }
    }

    // 如果都不能正确拆分，直接返回false
    return false;
}

实际上面代码运行效率很差，这是因为如果字符串s比较长的话，这里会包含大量的重复计算，我们还用上面的图来看下

我们看到红色的就是重复计算，这里因为字符串比较短，不是很明显，当字符串比较长的时候，这里的重复计算非常多。我们可以使用一个变量，来记录计算过的位置，如果之前判断过，就不在重复判断，直接跳过即可，代码如下

三、代码实现

DFS算法解决代码：

#include <iostream>
#include <string>
#include <vector>
#include <unordered_set>

bool wordBreak(const std::string& s, const std::vector<std::string>& wordDict);
bool dfs(const std::string& s, const std::vector<std::string>& wordDict, std::unordered_set<int>& indexSet, int start);

int main() {
    std::string s = "leetcode";
    std::vector<std::string> wordDict = {"leet", "code"};

    if (wordBreak(s, wordDict)) {
        std::cout << "The string can be segmented into words from the dictionary." << std::endl;
    } else {
        std::cout << "The string cannot be segmented into words from the dictionary." << std::endl;
    }

    return 0;
}

bool wordBreak(const std::string& s, const std::vector<std::string>& wordDict) {
    std::unordered_set<int> indexSet;
    return dfs(s, wordDict, indexSet, 0);
}

// start表示的是从字符串s的哪个位置开始
bool dfs(const std::string& s, const std::vector<std::string>& wordDict, std::unordered_set<int>& indexSet, int start) {
    // 字符串都拆分完了，返回true
    if (start == s.length()) {
        return true;
    }

    for (int i = start + 1; i <= s.length(); i++) {
        // 如果已经判断过了，就直接跳过，防止重复判断
        if (indexSet.find(i) != indexSet.end()) {
            continue;
        }

        // 截取子串，判断是否是在字典中
        std::string sub = s.substr(start, i - start);
        if (std::find(wordDict.begin(), wordDict.end(), sub) != wordDict.end()) {
            if (dfs(s, wordDict, indexSet, i)) {
                return true;
            }
            // 标记为已判断过
            indexSet.insert(i);
        }
    }

    return false;
}

BFS算法解决：

这题除了DFS以外，还可以使用BFS，BFS就是一层一层的遍历，如下图所示

BFS（广度优先搜索）通常不需要递归，而是使用一个队列来记录每一层需要处理的值。在BFS中，当我们截取子串时，如果截取的子串存在于字典中，我们就记录截取的位置，并在下一层从这个位置的下一个位置继续截取。下面是相应的C++代码实现：

#include <iostream>
#include <string>
#include <vector>
#include <unordered_set>
#include <queue>

bool wordBreak(const std::string& s, const std::vector<std::string>& wordDict) {
    // 这里为了提高效率，把list转化为set，因为set的查找效率要比list高
    std::unordered_set<std::string> setDict(wordDict.begin(), wordDict.end());
    // 记录当前层开始遍历字符串s的位置
    std::queue<int> queue;
    queue.push(0);
    int length = s.length();

    while (!queue.empty()) {
        int index = queue.front();
        queue.pop();

        // 如果字符串到遍历完了，直接返回true
        if (index == length) {
            return true;
        }

        for (int i = index + 1; i <= length; i++) {
            // 截取子串，判断是否是在字典中
            std::string sub = s.substr(index, i - index);
            if (setDict.find(sub) != setDict.end()) {
                queue.push(i);
            }
        }
    }

    return false;
}

int main() {
    std::string s = "leetcode";
    std::vector<std::string> wordDict = {"leet", "code"};

    if (wordBreak(s, wordDict)) {
        std::cout << "The string can be segmented into words from the dictionary." << std::endl;
    } else {
        std::cout << "The string cannot be segmented into words from the dictionary." << std::endl;
    }

    return 0;
}

这种也会出现重复计算的情况，所以这里我们也可以使用一个变量来记录下。

#include <iostream>
#include <string>
#include <vector>
#include <unordered_set>
#include <queue>

bool wordBreak(const std::string& s, const std::vector<std::string>& wordDict) {
    // 这里为了提高效率，把list转化为set，因为set的查找效率要比list高
    std::unordered_set<std::string> setDict(wordDict.begin(), wordDict.end());
    // 记录当前层开始遍历字符串s的位置
    std::queue<int> queue;
    queue.push(0);
    int length = s.length();
    // 记录访问过的位置，减少重复判断
    std::vector<bool> visited(length, false);

    while (!queue.empty()) {
        int index = queue.front();
        queue.pop();

        // 如果字符串都遍历完了，直接返回true
        if (index == length) {
            return true;
        }

        // 如果被访问过，则跳过
        if (visited[index]) {
            continue;
        }

        // 标记为访问过
        visited[index] = true;

        for (int i = index + 1; i <= length; i++) {
            // 截取子串，判断是否是在字典中
            std::string sub = s.substr(index, i - index);
            if (setDict.find(sub) != setDict.end()) {
                queue.push(i);
            }
        }
    }

    return false;
}

int main() {
    std::string s = "leetcode";
    std::vector<std::string> wordDict = {"leet", "code"};

    if (wordBreak(s, wordDict)) {
        std::cout << "The string can be segmented into words from the dictionary." << std::endl;
    } else {
        std::cout << "The string cannot be segmented into words from the dictionary." << std::endl;
    }

    return 0;
}

查看全文

http://www.kler.cn/a/301531.html