当前位置：首页 > article >正文

torch::和at:: factory function的差別

article 2025/2/13 15:31:01

torch::和at:: factory function的差別

前言
torch::autograd::THPVariable_rand
torch::rand_symint
at::rand_symint
demo
- torch命名空間
- at命名空間

前言

>>> import torch
>>> a = torch.rand(3, 4)
>>> a.requires_grad
False
>>> a = torch.rand(3, 4, requires_grad = True)
>>> a.requires_grad
True

在這兩個例子中，torch.rand factory function會根據requires_grad參數生成一個可微或不可微的張量。深入其C++底層，會發現它們調用的其實是torch::和at::兩個不同命名空間裡的factory function，本篇將會通過查看源碼和範例程序來了解不同factory function生成的張量有何差別。

torch::autograd::THPVariable_rand

如果使用gdb去查看程式運行的backtrace，可以發現torch::autograd::THPVariable_rand是從Python世界到C++世界後第一個與rand有關的函數。

torch/csrc/autograd/generated/python_torch_functions_0.cpp

static PyObject * THPVariable_rand(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  static PythonArgParser parser({
    "rand(SymIntArrayRef size, *, Generator? generator, DimnameList? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
    "rand(SymIntArrayRef size, *, Generator? generator, Tensor out=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
    "rand(SymIntArrayRef size, *, Tensor out=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
    "rand(SymIntArrayRef size, *, DimnameList? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
  }, /*traceable=*/true);

  ParsedArgs<8> parsed_args;
  auto _r = parser.parse(nullptr, args, kwargs, parsed_args);
  if(_r.has_torch_function()) {
    return handle_torch_function(_r, nullptr, args, kwargs, THPVariableFunctionsModule, "torch");
  }
  switch (_r.idx) {
    //...
    case 2: {
      if (_r.isNone(1)) {
        // aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
        const auto options = TensorOptions()
            .dtype(_r.scalartypeOptional(2))
            .device(_r.deviceWithDefault(4, torch::tensors::get_default_device()))
            .layout(_r.layoutOptional(3))
            .requires_grad(_r.toBool(6))
            .pinned_memory(_r.toBool(5));
        torch::utils::maybe_initialize_cuda(options);
        
        auto dispatch_rand = [](c10::SymIntArrayRef size, at::TensorOptions options) -> at::Tensor {
          pybind11::gil_scoped_release no_gil;
          return torch::rand_symint(size, options);
        };
        return wrap(dispatch_rand(_r.symintlist(0), options));
      } else {
        // aten::rand.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
        check_out_type_matches(_r.tensor(1), _r.scalartypeOptional(2),
                               _r.isNone(2), _r.layoutOptional(3),
                               _r.deviceWithDefault(4, torch::tensors::get_default_device()), _r.isNone(4));
        
        auto dispatch_rand_out = [](at::Tensor out, c10::SymIntArrayRef size) -> at::Tensor {
          pybind11::gil_scoped_release no_gil;
          return at::rand_symint_out(out, size);
        };
        return wrap(dispatch_rand_out(_r.tensor(1), _r.symintlist(0)).set_requires_grad(_r.toBool(6)));
      }
    }
  // ...
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

我們是以torch.rand(3, 4)的方式調用，也就是只提供了size參數，對照下面四種簽名的API：

    "rand(SymIntArrayRef size, *, Generator? generator, DimnameList? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
    "rand(SymIntArrayRef size, *, Generator? generator, Tensor out=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
    "rand(SymIntArrayRef size, *, Tensor out=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
    "rand(SymIntArrayRef size, *, DimnameList? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",

當中除了第二種(0-based)簽名的rand函數外都需要額外提供如generator或names等參數。所以此處會進入switch的case 2。

接著檢查第1個(0-based)參數_r.isNone(1)，也就是out參數，是否為空：

如果未提供out參數會進入if分支，接著調用torch::rand_symint，回傳可微的at::Tensor
如果提供了out參數則會進入else分支，接著調用at::rand_symint，回傳不可微的at::Tensor

此處未提供out參數，所以會進入if分支。

另外注意函數的第六個參數requires_grad，在if分支是以如下方式被解析，並將此資訊記錄在TensorOptions類型的物件裡：

        const auto options = TensorOptions()
            .dtype(_r.scalartypeOptional(2))
            .device(_r.deviceWithDefault(4, torch::tensors::get_default_device()))
            .layout(_r.layoutOptional(3))
            .requires_grad(_r.toBool(6))
            .pinned_memory(_r.toBool(5));

接著會將TensorOptions物件當作參數傳入torch::rand_symint：

          return torch::rand_symint(size, options);

在else分支則會先調用dispatch_rand_out得到at::Tensor：

        auto dispatch_rand_out = [](at::Tensor out, c10::SymIntArrayRef size) -> at::Tensor {
          pybind11::gil_scoped_release no_gil;
          return at::rand_symint_out(out, size);
        };

然後再透過set_requires_grad函數讓它變成可微或不可微：

        return wrap(dispatch_rand_out(_r.tensor(1), _r.symintlist(0)).set_requires_grad(_r.toBool(6)));

接著進入torch::rand_symint的源碼來看看它和at::rand_symint的區別。

torch::rand_symint

torch/csrc/autograd/generated/variable_factories.h

inline at::Tensor rand_symint(c10::SymIntArrayRef size, at::TensorOptions options = {}) {
  at::AutoDispatchBelowADInplaceOrView guard;
  return autograd::make_variable(at::rand_symint(size, at::TensorOptions(options).requires_grad(c10::nullopt)), /*requires_grad=*/options.requires_grad());
}

可以看到此處是先調用at::rand_symint得到at::Tensor後再調用autograd::make_variable對返回的張量再做一層包裝。

at::Tensor繼承自at::TensorBase，at::TensorBase有一個c10::TensorImpl的成員變數autograd_meta_。autograd::make_variable會根據第二個參數requires_grad調用c10::TensorImpl::set_autograd_meta來將autograd_meta_設為空或一個non-trivial的值。如果autograd_meta_非空，回傳的Variable就會被賦予自動微分的功能。

at::rand_symint

build/aten/src/ATen/Functions.h

// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand_symint(c10::SymIntArrayRef size, at::TensorOptions options={}) {
    return at::_ops::rand::call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor rand(c10::SymIntArrayRef size, at::TensorOptions options={}) {
    return at::_ops::rand::call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
  }
}

at::rand_symint函數其實就只是調用at::_ops::rand::call就直接返回。

PYTORCH C++ API - Autograd可以作為印證：

The at::Tensor class in ATen is not differentiable by default. To add the differentiability of tensors the autograd API provides, you must use tensor factory functions from the torch:: namespace instead of the at:: namespace. For example, while a tensor created with at::ones will not be differentiable, a tensor created with torch::ones will be.

at::下的factory function製造出來的張量沒有自動微分功能；如果想讓張量擁有自動微分功能，可以改用torch::下的factory function（但需傳入torch::requires_grad()）。

demo

安裝LibTorch後新增一個autograd.cpp，參考AUTOGRAD IN C++ FRONTEND：

#include <torch/torch.h>

int main(){
    torch::Tensor x = torch::ones({2, 2});
    std::cout << x << std::endl;
    std::cout << x.requires_grad() << std::endl; // 0
    x = torch::ones({2, 2}, torch::requires_grad());
    // 建構時傳入torch::requires_grad(),張量的requires_grad()便會為true
    std::cout << x.requires_grad() << std::endl; // 1
    torch::Tensor y = x.mean();
    std::cout << y << std::endl;
    std::cout << y.requires_grad() << std::endl; // 1
    
    // 對於非葉子節點，必須事先調用retain_grad()，這樣它在反向傳播時的梯度才會被保留
    y.retain_grad(); // retain grad for non-leaf Tensor
    y.backward();
    std::cout << y.grad() << std::endl;
    std::cout << x.grad() << std::endl;

    // at命名空間
    at::Tensor x1 = at::ones({2, 2});
    std::cout << x1.requires_grad() << std::endl; // 0
    at::Tensor y1 = x1.mean();
    std::cout << y1.requires_grad() << std::endl; // 0
    // y1.retain_grad(); // core dumped
    
    // at::Tensor透過set_requires_grad後就可以被微分了
    x1.set_requires_grad(true);
    std::cout << "after set requires grad: " << x1.requires_grad() << std::endl; // 1
    std::cout << y1.requires_grad() << std::endl; // 0
    // x1改變了之後y1也必須更新
    y1 = x1.mean();
    std::cout << y1.requires_grad() << std::endl; // 1
    y1.retain_grad(); // retain grad for non-leaf Tensor
    y1.backward();
    std::cout << y1.grad() << std::endl;
    std::cout << x1.grad() << std::endl;
    return 0;
}

編寫以下CMakeLists.txt：

cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
project(autograd)

find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")

add_executable(autograd autograd.cpp)
target_link_libraries(autograd "${TORCH_LIBRARIES}")
set_property(TARGET autograd PROPERTY CXX_STANDARD 17)

# The following code block is suggested to be used on Windows.
# According to https://github.com/pytorch/pytorch/issues/25457,
# the DLLs need to be copied to avoid memory errors.
if (MSVC)
  file(GLOB TORCH_DLLS "${TORCH_INSTALL_PREFIX}/lib/*.dll")
  add_custom_command(TARGET autograd
                     POST_BUILD
                     COMMAND ${CMAKE_COMMAND} -E copy_if_different
                     ${TORCH_DLLS}
                     $<TARGET_FILE_DIR:autograd>)
endif (MSVC)

編譯執行：

rm -rf * && cmake -DCMAKE_PREFIX_PATH=/root/Documents/installation/libtorch .. && make && ./autograd

逐行分析如下。

torch命名空間

使用torch命名空間的factory function創造torch::Tensor：

    torch::Tensor x = torch::ones({2, 2});
    std::cout << x << std::endl;

結果如下：

 1  1
 1  1
[ CPUFloatType{2,2} ]

此處沒傳入torch::requires_grad()，所以張量的 requires_grad()會為false：

    std::cout << x.requires_grad() << std::endl; // 0

如果建構時傳入torch::requires_grad()，張量的requires_grad()便會為true：

    x = torch::ones({2, 2}, torch::requires_grad());
    std::cout << x.requires_grad() << std::endl; // 1

    torch::Tensor y = x.mean();
    std::cout << y << std::endl;

    1
    [ CPUFloatType{} ]

    std::cout << y.requires_grad() << std::endl; // 1

對於非葉子節點，必須事先調用retain_grad()，這樣它在反向傳播時的梯度才會被保留：

	y.retain_grad(); // retain grad for non-leaf Tensor
    y.backward();
    std::cout << y.grad() << std::endl;

    1
    [ CPUFloatType{} ]

如果前面沒有y.retain_grad()直接調用y.grad()，將會導致core dumped：

    [W TensorBody.h:489] Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-lea
f Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (function grad)           [ Tensor (undefined) ]                                                                                                                                        
    terminate called after throwing an instance of 'c10::Error'                                                                                                     what():  Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if 
    you need to access saved tensors after calling backward.                                                                                                      
    Exception raised from unpack at ../torch/csrc/autograd/saved_variable.cpp:136 (most recent call first):                                                       frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7faba17f4d47 in /root/Documents/installation/libtorch/lib/libc10.so)                  
    frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x68 (0x7faba17ae0fc in /root/Documents/installation/libtorch/lib
    /libc10.so)                                                                                                                                                   
    frame #2: torch::autograd::SavedVariable::unpack(std::shared_ptr<torch::autograd::Node>) const + 0x13b2 (0x7fab8f87d6c2 in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)                                                        
    frame #3: torch::autograd::generated::MeanBackward0::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x98 (0x7fab8eb73998 in /root/Documents/i
    nstallation/libtorch/lib/libtorch_cpu.so)                            
    frame #4: <unknown function> + 0x4d068cb (0x7fab8f8428cb in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
    frame #5: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std:
    :shared_ptr<torch::autograd::ReadyQueue> const&) + 0xe8d (0x7fab8f83b94d in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
    frame #6: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x698 (0x7fab8f83cca8 in /root/Documents/installation/lib
    torch/lib/libtorch_cpu.so)
    frame #7: torch::autograd::Engine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>, torch::
    autograd::InputBuffer&&) + 0x3dd (0x7fab8f8378bd in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
    frame #8: torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::all
    ocator<at::Tensor> > const&, bool, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0xa26 (0x7fab8f83a546 in /
    root/Documents/installation/libtorch/lib/libtorch_cpu.so)
    frame #9: <unknown function> + 0x4ce0e81 (0x7fab8f81ce81 in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
    frame #10: torch::autograd::backward(std::vector<at::Tensor, std::allocator<at::Tensor> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&,
     c10::optional<bool>, bool, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) + 0x5c (0x7fab8f81f88c in /root/Documents/installation/libtorch/lib/l
    ibtorch_cpu.so)
    frame #11: <unknown function> + 0x4d447de (0x7fab8f8807de in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
    frame #12: at::Tensor::_backward(c10::ArrayRef<at::Tensor>, c10::optional<at::Tensor> const&, c10::optional<bool>, bool) const + 0x48 (0x7fab8c51b208 in /root
    /Documents/installation/libtorch/lib/libtorch_cpu.so)
    frame #13: <unknown function> + 0x798a (0x5638af5ed98a in ./autograd)
    frame #14: <unknown function> + 0x4d55 (0x5638af5ead55 in ./autograd)
    frame #15: <unknown function> + 0x29d90 (0x7fab8a6e9d90 in /lib/x86_64-linux-gnu/libc.so.6)
    frame #16: __libc_start_main + 0x80 (0x7fab8a6e9e40 in /lib/x86_64-linux-gnu/libc.so.6)
    frame #17: <unknown function> + 0x4985 (0x5638af5ea985 in ./autograd)

繼續看x的梯度：

    std::cout << x.grad() << std::endl;

     0.2500  0.2500
     0.2500  0.2500
    [ CPUFloatType{2,2} ]

at命名空間

改用at命名空間下的factory function創建張量：

    // at命名空間
    at::Tensor x1 = at::ones({2, 2});
    std::cout << x1.requires_grad() << std::endl; // 0

如果我們使用跟torch::ones類似的方式在at::ones裡加入torch::requires_grad()參數會如何呢？結果x1.requires_grad()仍然會是0。回顧at::rand_symint，我們可以猜想這是因為在進一步調用底層函數時只關注options.dtype_opt，options.layout_opt，options.device_opt和options.pinned_memory_opt等四個選項，而忽略options.requires_grad：

    at::_ops::rand::call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());

定義y1變數，一開始其requires_grad為false：

    at::Tensor y1 = x1.mean();
    std::cout << y1.requires_grad() << std::endl; // 0

因為此時x1, y1都是不可微的，如果嘗試調用y1.retain_grad()將會導致core dumped：

    terminate called after throwing an instance of 'c10::Error'
      what():  can't retain_grad on Tensor that has requires_grad=False
    Exception raised from retain_grad at ../torch/csrc/autograd/variable.cpp:503 (most recent call first):
    frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f7401f62d47     in /root/Documents/installation/libtorch/lib/libc10.so)
    frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x68 (0x7f7401f1c0fc in /root/Documents/installation/libtorch/lib/libc10.so)
    frame #2: <unknown function> + 0x4d4751f (0x7f73efff151f in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
    frame #3: <unknown function> + 0x4cef (0x560b61ca7cef in ./autograd)
    frame #4: <unknown function> + 0x29d90 (0x7f73eae57d90 in /lib/x86_64-linux-gnu/libc.so.6)
    frame #5: __libc_start_main + 0x80 (0x7f73eae57e40 in /lib/x86_64-linux-gnu/libc.so.6)
    frame #6: <unknown function> + 0x4965 (0x560b61ca7965 in ./autograd)

    Aborted (core dumped)

如果想要讓它們變成可微的呢？我們可以透過set_requires_grad函數：

    x1.set_requires_grad(true);
    std::cout << "after set requires grad: " << x1.requires_grad() << std::endl; // 1
    std::cout << y1.requires_grad() << std::endl; // 0

可以看到這時候y1的requires_grad為false，這是因為x1改變了之後y1尚未更新。

透過以下方式更新後y1後，其requires_grad也會變為true：

    y1 = x1.mean();
    std::cout << y1.requires_grad() << std::endl; // 1

y1.retain_grad();的作用是保留非葉子張量的梯度：

    y1.retain_grad(); // retain grad for non-leaf Tensor

調用該函數的前提是該張量的requires_grad必須為true，如果省略y1 = x1.mean();這一行，因為y1的requires_grad為false，所以在y1.retain_grad();時會出現如下錯誤：

    terminate called after throwing an instance of 'c10::Error'
      what():  can't retain_grad on Tensor that has requires_grad=False
    Exception raised from retain_grad at ../torch/csrc/autograd/variable.cpp:503 (most recent call first):
    frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fafd2dfcd47 in /root/Documents/installation/libtorch/lib/libc10.so)
    frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x68 (0x7fafd2db60fc in /root/Documents/installation/libtorch/lib/libc10.so)
    frame #2: <unknown function> + 0x4d4751f (0x7fafc0e8b51f in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
    frame #3: <unknown function> + 0x4f77 (0x55f9a73dff77 in ./autograd)
    frame #4: <unknown function> + 0x29d90 (0x7fafbbcf1d90 in /lib/x86_64-linux-gnu/libc.so.6)
    frame #5: __libc_start_main + 0x80 (0x7fafbbcf1e40 in /lib/x86_64-linux-gnu/libc.so.6)
    frame #6: <unknown function> + 0x4985 (0x55f9a73df985 in ./autograd)

    Aborted (core dumped)

開始反向傳播，然後查看y1的梯度：

    y1.backward();
    std::cout << y1.grad() << std::endl;

     1
    [ CPUFloatType{} ]

如果注釋掉y1.retain_grad();，則y1的梯度不會被保留，只會輸出一個未定義的張量，並出現以下警告：

    [W TensorBody.h:489] Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (function grad)
    [ Tensor (undefined) ]

查看x1的梯度：

    std::cout << x1.grad() << std::endl;

結果與使用torch::時相同：

     0.2500  0.2500
     0.2500  0.2500
    [ CPUFloatType{2,2} ]

查看全文

http://www.kler.cn/a/149293.html

xtuner微调internlm2-chat-1_8b--xtuner中文文档快速上手案例

3.6 学习UVM中的uvm_sequencer类分为几步？

Ubuntu 下 nginx-1.24.0 源码分析 ngx_tm_t 类型

Odoo17 0.1常见的QWeb 模板语言指令的详细总结

跨平台开发利器：UniApp 全面解析与实践指南

C++：gtest的使用

【问题系列】消费者与MQ连接断开问题解决方案(一)

Go使用logrus框架

Unity 轨道展示系统（DollyMotion）

go标准库

基于协同过滤算法的音乐推荐系统的研究与实现

激光雷达毫米波雷达

PyTorch：模型加载方法详解

Vue2 若依框架头像上传全部代码

建筑工程模板包工包料价格

Kubernetes基础(九)-标签管理

【Web】攻防世界难度3 刷题记录(1)

Linux 调试工具：gdb

使用shell快速查看电脑曾经连接过的WiFi密码

记一次简单的PHP反序列化字符串溢出

交流负载的功能实现原理

各种排序算法

sed应用

视觉CV-AIGC一周最新技术精选(2023-11)

【面经八股】搜广推方向：面试记录（四）

git commit 撤销的三种方法

torch::和at:: factory function的差別

前言

torch::autograd::THPVariable_rand

torch::rand_symint

at::rand_symint

demo

torch命名空間

at命名空間

相关文章：