当前位置：首页 > article >正文

全文 - MLIR Toy Tutorial Chapter 1: Toy Language and AST

article 2025/3/31 0:29:19

1，全文翻译

Toy 语言

本教程，将会借助一个玩具语言来讲解，这个语言我们称其为 Toy。Toy 是一个基于张量的语言，它允许你定义函数，执行一些数学计算，并且打印结果。做这样的设定，是因为我们希望让教程保持简明；codegen 部分将会限制张量的维度小于等于2，而且Toy中的数据类型都是 64bit 浮点型的，也就是C语言中的double类型。于是，所有的值都隐式定义为double精度的，而且，所有的值都是不变的常量，也就是说，每一个操作的返回值都会是新分配的变量，再就是，重新分配变量是自动化管理的。上述说明已经足够了，没有什么比通读一个示例更有助于理解 MLIR 的目的和方法了。

def main() {
  # Define a variable `a` with shape <2, 3>, initialized with the literal value.
  # The shape is inferred from the supplied literal.
  #定义一个形状为2行3列的变量a，如下字面逐元素初始化。变量的形状通过提供的初始化来推导。
  var a = [[1, 2, 3], [4, 5, 6]];

  # b is identical to a, the literal tensor is implicitly reshaped: defining new
  # variables is the way to reshape tensors (element count must match).
  #变量b与a是一样的，这个初始化的方式是隐式地变形了：变形张量是通过定义新的变量的方式实现的，但是元素个数必须能对上。
  var b<2, 3> = [1, 2, 3, 4, 5, 6];

  # transpose() and print() are the only builtin, the following will transpose
  # a and b and perform an element-wise multiplication before printing the result.
  #transpose() 和 print() 函数是唯一内置的函数，接下来将会转置a 和 b，并且逐元素做乘法运算，然后打印结果。
  print(transpose(a) * transpose(b));
}

类型检查是通过类型推导静态执行的；Toy 语言仅仅要求在必要的时候指定张量的类型。函数是通用的：它们的参数未指定阶数，也就是说，我们知道函数的每个参数是一个张量，但是我们不知道它们的维度。为每一个新发现的调用点的签名，都被特化处理。让我们看一遍之前的示例代码，这次我们增加了一个用户自定义的函数：

# User defined generic function that operates on unknown shaped arguments.
# 用户自定义的通用函数，它作用在未知形状的参数上
def multiply_transpose(a, b) {
  return transpose(a) * transpose(b);
}

def main() {
  # Define a variable `a` with shape <2, 3>, initialized with the literal value.
  var a = [[1, 2, 3], [4, 5, 6]];
  var b<2, 3> = [1, 2, 3, 4, 5, 6];

  # This call will specialize `multiply_transpose` with <2, 3> for both
  # arguments and deduce a return type of <3, 2> in initialization of `c`.
  # 这个调用将会给函数 multiply_transpose 指定两个形状为2行3列的张量作为参数，并且推导出返回值c的形状为3行2列，按此做初始化。
  var c = multiply_transpose(a, b);

  # A second call to `multiply_transpose` with <2, 3> for both arguments will
  # reuse the previously specialized and inferred version and return <3, 2>.
  #基本同上
  var d = multiply_transpose(b, a);

  # A new call with <3, 2> (instead of <2, 3>) for both dimensions will
  # trigger another specialization of `multiply_transpose`.
  # 这里是一个新的调用，入参的形状变为3行2列，而不再是2行3列，这将会触发特化另一新的 multiply_transpose函数的实现。
  var e = multiply_transpose(c, d);

  # Finally, calling into `multiply_transpose` with incompatible shapes
  # (<2, 3> and <3, 2>) will trigger a shape inference error.
  # 最后，调用对函数 multiply_transpose 做一次参数形状不兼容的调用，这将会触发一个形状推导错误。
  var f = multiply_transpose(a, c);
}

dump AST

从上边的代码生成的 AST 是相当简单的。这里对它做了转储：

Module:
  Function 
    Proto 'multiply_transpose' @test/Examples/Toy/Ch1/ast.toy:4:1
    Params: [a, b]
    Block {
      Return
        BinOp: * @test/Examples/Toy/Ch1/ast.toy:5:25
          Call 'transpose' [ @test/Examples/Toy/Ch1/ast.toy:5:10
            var: a @test/Examples/Toy/Ch1/ast.toy:5:20
          ]
          Call 'transpose' [ @test/Examples/Toy/Ch1/ast.toy:5:25
            var: b @test/Examples/Toy/Ch1/ast.toy:5:35
          ]
    } // Block
  Function 
    Proto 'main' @test/Examples/Toy/Ch1/ast.toy:8:1
    Params: []
    Block {
      VarDecl a<> @test/Examples/Toy/Ch1/ast.toy:11:3
        Literal: <2, 3>[ <3>[ 1.000000e+00, 2.000000e+00, 3.000000e+00], <3>[ 4.000000e+00, 5.000000e+00, 6.000000e+00]] @test/Examples/Toy/Ch1/ast.toy:11:11
      VarDecl b<2, 3> @test/Examples/Toy/Ch1/ast.toy:15:3
        Literal: <6>[ 1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00, 5.000000e+00, 6.000000e+00] @test/Examples/Toy/Ch1/ast.toy:15:17
      VarDecl c<> @test/Examples/Toy/Ch1/ast.toy:19:3
        Call 'multiply_transpose' [ @test/Examples/Toy/Ch1/ast.toy:19:11
          var: a @test/Examples/Toy/Ch1/ast.toy:19:30
          var: b @test/Examples/Toy/Ch1/ast.toy:19:33
        ]
      VarDecl d<> @test/Examples/Toy/Ch1/ast.toy:22:3
        Call 'multiply_transpose' [ @test/Examples/Toy/Ch1/ast.toy:22:11
          var: b @test/Examples/Toy/Ch1/ast.toy:22:30
          var: a @test/Examples/Toy/Ch1/ast.toy:22:33
        ]
      VarDecl e<> @test/Examples/Toy/Ch1/ast.toy:25:3
        Call 'multiply_transpose' [ @test/Examples/Toy/Ch1/ast.toy:25:11
          var: c @test/Examples/Toy/Ch1/ast.toy:25:30
          var: d @test/Examples/Toy/Ch1/ast.toy:25:33
        ]
      VarDecl f<> @test/Examples/Toy/Ch1/ast.toy:28:3
        Call 'multiply_transpose' [ @test/Examples/Toy/Ch1/ast.toy:28:11
          var: a @test/Examples/Toy/Ch1/ast.toy:28:30
          var: c @test/Examples/Toy/Ch1/ast.toy:28:33
        ]
    } // Block

可以在文件夹 examples/toy/Ch1/ 中的示例代码上重新生成这个AST。请尝试运行：

path/to/BUILD/bin/toyc-ch1 test/Examples/Toy/Ch1/ast.toy -emit=ast

词法分析器的代码是非常简单的，源代码都在一个单独的头文件中：
examples/toy/Ch1/include/toy/Lexer.h。而语法解析器可以在 examples/toy/Ch1include/toy/Parser.h 中找到。它是一个递归下降词法解析器。如果你不熟悉这样的词法解析器/语法解析器，那么这个代码很像LLVM Kaleidoscope 的代码，他们在前两章被详细解释过：https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl02.html

2，gdb 动态解析 toy-ch1 的源代码

2.1 构建 debug toy-ch1

通过如下方式，便可以构建出来 debug 版本toy-chx

$ cd llvm-project
$ git checkout llvmorg-16.0.1
$ mkdir build
$ cd build
 
cmake -G Ninja ../llvm \
   -DCMAKE_INSTALL_PREFIX=../../locald_16.0.6_x86_mlir \
   -DLLVM_ENABLE_PROJECTS="clang;lld;mlir" \
   -DLLVM_BUILD_EXAMPLES=ON \
   -DLLVM_TARGETS_TO_BUILD="Native;NVPTX;AMDGPU" \
   -DCMAKE_BUILD_TYPE=Debug \
   -DLLVM_USE_SPLIT_DWARF=ON \
   -DLLVM_ENABLE_ASSERTIONS=ON \
   -DLLVM_INSTALL_UTILS=ON

可执行文件躺在 llvm-project/build $ ls bin/ 中。

可以直接执行试试：

3， tyo-ch1 的部分源码逻辑

main
    parseInputFile()
        Parser parser(lexer);//源码文件存入 lexer 后，通过构造函数，被parser 持有。
          parser.parseModule();//解析源代码，从 module 层级一直向下分析；对token的分析，没有使用状态机理论，用了朴素的方式。
              Parser::parseDefinition() //迭代调用本函数，每次用于分析一个 toy 源码函数：函数头，函数体；//返回类型为一个指针，指向函数 AST
                  Parser::parsePrototype()//分析函数头，返回 PrototypeAST；通过对文法的解析
                  Parser::parseBlock()//分析函数体，返回 ExprASTList；通过对文法的解析
                  构建 FunctionAST，并返回该 AST；




Parser::parsePrototype()
    找到 def 的token：tok_def
    找到函数名的token：tok_identifier；将函数名保存为字符串；
    找到参数列表的起头：(
    找到参数列表:
        找到参数名的 ttoken：tok_identifier；基于变量名构建变量表达式 AST；将该 AST 压入 vector中。
        找到参数分隔符：,
        迭代找参数名；
    匹配参数列表结束符：)
    构建函数原型 AST: PrototypeAST [基于函数名 + 参数列表的AST ;]
    将函数原型 AST 返回；

Parser::parseBlock()//通过文法，分析函数体，返回 ExprASTList
    找到 {
    找到 var
    parseDeclaration(), 解析变量声// 返回 VarDeclExprAST

Parser::parseDeclaration()// 返回 VarDeclExprAST
    找到变量名 token：保存变量名为字符串 id；
    找到 <
    解析变量的shape：Parser::parseType()，保存在 type中（VarType，一个 int64 vector）
    找到赋值号 =
    解析表达式 Parser::parseExpression()，得到 ExprAST类型返回值，保存在 expr
    构建 VarDeclExprAST 并返回之；基于 id,type,

    parseReturn();

查看全文

http://www.kler.cn/a/610841.html