当前位置：首页 > article >正文

下面是一个简单的C++词法分析器示例

article 2025/2/6 23:30:44

下面是一个简单的C++词法分析器示例，它能够识别基本的词法单元（如标识符、数字、运算符等）。这个词法分析器是编译原理中的一个基础组件，用于将输入的字符流转换为有意义的词法单元（Token）。

```

### 代码说明：
1. **TokenType枚举**：定义了不同类型的词法单元，如标识符、数字、运算符等。
2. **Token结构体**：表示一个词法单元，包含类型和值。
3. **Lexer类**：词法分析器类，负责将输入的字符串转换为Token流。
- `getNextToken()`：获取下一个Token。
- `skipWhitespace()`：跳过空白字符。
- `parseIdentifier()`：解析标识符。
- `parseNumber()`：解析数字。
- `isOperator()`：判断字符是否为运算符。
4. **main函数**：测试词法分析器，输入一个字符串并输出识别到的Token。

### 示例输出：
对于输入 `"int a = 123 + 456;"`，输出可能如下：
```
Token Type: 0, Value: int
Token Type: 0, Value: a
Token Type: 2, Value: =
Token Type: 1, Value: 123
Token Type: 2, Value: +
Token Type: 1, Value: 456
Token Type: 4, Value: ;
Token Type: 3, Value:
```

### 注意事项：
- 这个词法分析器是一个非常基础的实现，实际编译器的词法分析器会更加复杂，需要处理更多的细节（如注释、字符串、错误处理等）。
- 这个示例没有处理浮点数、多字符运算符（如`==`、`!=`）等情况，实际应用中需要进一步扩展。

希望这个示例能帮助你理解如何用C++实现一个简单的词法分析器！
// Lexer.cpp : 此文件包含 "main" 函数。程序执行将在此处开始并结束。
//

#include <iostream>
#include <string>
#include <cctype>
#include <unordered_map>

// 定义Token类型
enum TokenType {
    TOKEN_ID,       // 标识符
    TOKEN_NUM,      // 数字
    TOKEN_OP,       // 运算符
    TOKEN_EOF,      // 文件结束
    TOKEN_UNKNOWN   // 未知字符
};

// Token结构体
struct Token {
    TokenType type;
    std::string value;
};

// 词法分析器类
class Lexer {
public:
    Lexer(const std::string& input) : input(input), pos(0) {}

    // 获取下一个Token
    Token getNextToken() {
        while (pos < input.size()) {
            char currentChar = input[pos];

            // 跳过空白字符
            if (isspace(currentChar)) {
                skipWhitespace();
                continue;
            }

            // 识别标识符
            if (isalpha(currentChar)) {
                return Token{ TOKEN_ID, parseIdentifier() };
            }

            // 识别数字
            if (isdigit(currentChar)) {
                return Token{ TOKEN_NUM, parseNumber() };
            }

            // 识别运算符
            if (isOperator(currentChar)) {
                pos++;  // 移动到下一个字符
                return Token{ TOKEN_OP, std::string(1, currentChar) };
            }

            // 未知字符
            pos++;
            return Token{ TOKEN_UNKNOWN, std::string(1, currentChar) };
        }

        // 文件结束
        return Token{ TOKEN_EOF, "" };
    }

private:
    std::string input;
    size_t pos;

    // 跳过空白字符
    void skipWhitespace() {
        while (pos < input.size() && isspace(input[pos])) {
            pos++;
        }
    }

    // 解析标识符
    std::string parseIdentifier() {
        std::string identifier;
        while (pos < input.size() && (isalnum(input[pos]) || input[pos] == '_')) {
            identifier += input[pos];
            pos++;
        }
        return identifier;
    }

    // 解析数字
    std::string parseNumber() {
        std::string number;
        while (pos < input.size() && isdigit(input[pos])) {
            number += input[pos];
            pos++;
        }
        return number;
    }

    // 判断是否为运算符
    bool isOperator(char c) {
        static const std::string operators = "+-*/=<>!&|";
        return operators.find(c) != std::string::npos;
    }
};

int main() {
    std::string input = "int a = 123 + 456;";
    Lexer lexer(input);

    Token token;
    do {
        token = lexer.getNextToken();
        std::cout << "Token Type: " << token.type << ", Value: " << token.value << std::endl;
    } while (token.type != TOKEN_EOF);

    return 0;
}