torch::和at:: factory function的差別
torch::和at:: factory function的差別
- 前言
- torch::autograd::THPVariable_rand
- torch::rand_symint
- at::rand_symint
- demo
- torch命名空間
- at命名空間
前言
>>> import torch
>>> a = torch.rand(3, 4)
>>> a.requires_grad
False
>>> a = torch.rand(3, 4, requires_grad = True)
>>> a.requires_grad
True
在這兩個例子中,torch.rand
factory function會根據requires_grad
參數生成一個可微或不可微的張量。深入其C++底層,會發現它們調用的其實是torch::
和at::
兩個不同命名空間裡的factory function,本篇將會通過查看源碼和範例程序來了解不同factory function生成的張量有何差別。
torch::autograd::THPVariable_rand
如果使用gdb去查看程式運行的backtrace,可以發現torch::autograd::THPVariable_rand
是從Python世界到C++世界後第一個與rand
有關的函數。
torch/csrc/autograd/generated/python_torch_functions_0.cpp
static PyObject * THPVariable_rand(PyObject* self_, PyObject* args, PyObject* kwargs)
{
HANDLE_TH_ERRORS
static PythonArgParser parser({
"rand(SymIntArrayRef size, *, Generator? generator, DimnameList? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
"rand(SymIntArrayRef size, *, Generator? generator, Tensor out=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
"rand(SymIntArrayRef size, *, Tensor out=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
"rand(SymIntArrayRef size, *, DimnameList? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
}, /*traceable=*/true);
ParsedArgs<8> parsed_args;
auto _r = parser.parse(nullptr, args, kwargs, parsed_args);
if(_r.has_torch_function()) {
return handle_torch_function(_r, nullptr, args, kwargs, THPVariableFunctionsModule, "torch");
}
switch (_r.idx) {
//...
case 2: {
if (_r.isNone(1)) {
// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
const auto options = TensorOptions()
.dtype(_r.scalartypeOptional(2))
.device(_r.deviceWithDefault(4, torch::tensors::get_default_device()))
.layout(_r.layoutOptional(3))
.requires_grad(_r.toBool(6))
.pinned_memory(_r.toBool(5));
torch::utils::maybe_initialize_cuda(options);
auto dispatch_rand = [](c10::SymIntArrayRef size, at::TensorOptions options) -> at::Tensor {
pybind11::gil_scoped_release no_gil;
return torch::rand_symint(size, options);
};
return wrap(dispatch_rand(_r.symintlist(0), options));
} else {
// aten::rand.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
check_out_type_matches(_r.tensor(1), _r.scalartypeOptional(2),
_r.isNone(2), _r.layoutOptional(3),
_r.deviceWithDefault(4, torch::tensors::get_default_device()), _r.isNone(4));
auto dispatch_rand_out = [](at::Tensor out, c10::SymIntArrayRef size) -> at::Tensor {
pybind11::gil_scoped_release no_gil;
return at::rand_symint_out(out, size);
};
return wrap(dispatch_rand_out(_r.tensor(1), _r.symintlist(0)).set_requires_grad(_r.toBool(6)));
}
}
// ...
}
Py_RETURN_NONE;
END_HANDLE_TH_ERRORS
}
我們是以torch.rand(3, 4)
的方式調用,也就是只提供了size參數,對照下面四種簽名的API:
"rand(SymIntArrayRef size, *, Generator? generator, DimnameList? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
"rand(SymIntArrayRef size, *, Generator? generator, Tensor out=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
"rand(SymIntArrayRef size, *, Tensor out=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
"rand(SymIntArrayRef size, *, DimnameList? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
當中除了第二種(0-based)簽名的rand
函數外都需要額外提供如generator
或names
等參數。所以此處會進入switch的case 2。
接著檢查第1個(0-based)參數_r.isNone(1)
,也就是out
參數,是否為空:
-
如果未提供
out
參數會進入if分支,接著調用torch::rand_symint
,回傳可微的at::Tensor
-
如果提供了
out
參數則會進入else分支,接著調用at::rand_symint
,回傳不可微的at::Tensor
此處未提供out
參數,所以會進入if分支。
另外注意函數的第六個參數requires_grad
,在if分支是以如下方式被解析,並將此資訊記錄在TensorOptions
類型的物件裡:
const auto options = TensorOptions()
.dtype(_r.scalartypeOptional(2))
.device(_r.deviceWithDefault(4, torch::tensors::get_default_device()))
.layout(_r.layoutOptional(3))
.requires_grad(_r.toBool(6))
.pinned_memory(_r.toBool(5));
接著會將TensorOptions
物件當作參數傳入torch::rand_symint
:
return torch::rand_symint(size, options);
在else分支則會先調用dispatch_rand_out
得到at::Tensor
:
auto dispatch_rand_out = [](at::Tensor out, c10::SymIntArrayRef size) -> at::Tensor {
pybind11::gil_scoped_release no_gil;
return at::rand_symint_out(out, size);
};
然後再透過set_requires_grad
函數讓它變成可微或不可微:
return wrap(dispatch_rand_out(_r.tensor(1), _r.symintlist(0)).set_requires_grad(_r.toBool(6)));
接著進入torch::rand_symint
的源碼來看看它和at::rand_symint
的區別。
torch::rand_symint
torch/csrc/autograd/generated/variable_factories.h
inline at::Tensor rand_symint(c10::SymIntArrayRef size, at::TensorOptions options = {}) {
at::AutoDispatchBelowADInplaceOrView guard;
return autograd::make_variable(at::rand_symint(size, at::TensorOptions(options).requires_grad(c10::nullopt)), /*requires_grad=*/options.requires_grad());
}
可以看到此處是先調用at::rand_symint
得到at::Tensor
後再調用autograd::make_variable
對返回的張量再做一層包裝。
at::Tensor
繼承自at::TensorBase
,at::TensorBase
有一個c10::TensorImpl
的成員變數autograd_meta_
。autograd::make_variable
會根據第二個參數requires_grad
調用c10::TensorImpl::set_autograd_meta
來將autograd_meta_
設為空或一個non-trivial的值。如果autograd_meta_
非空,回傳的Variable
就會被賦予自動微分的功能。
at::rand_symint
build/aten/src/ATen/Functions.h
// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand_symint(c10::SymIntArrayRef size, at::TensorOptions options={}) {
return at::_ops::rand::call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
namespace symint {
template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
at::Tensor rand(c10::SymIntArrayRef size, at::TensorOptions options={}) {
return at::_ops::rand::call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
}
at::rand_symint
函數其實就只是調用at::_ops::rand::call
就直接返回。
PYTORCH C++ API - Autograd可以作為印證:
The at::Tensor class in ATen is not differentiable by default. To add the differentiability of tensors the autograd API provides, you must use tensor factory functions from the torch:: namespace instead of the at:: namespace. For example, while a tensor created with at::ones will not be differentiable, a tensor created with torch::ones will be.
at::
下的factory function製造出來的張量沒有自動微分功能;如果想讓張量擁有自動微分功能,可以改用torch::
下的factory function(但需傳入torch::requires_grad()
)。
demo
安裝LibTorch後新增一個autograd.cpp
,參考AUTOGRAD IN C++ FRONTEND:
#include <torch/torch.h>
int main(){
torch::Tensor x = torch::ones({2, 2});
std::cout << x << std::endl;
std::cout << x.requires_grad() << std::endl; // 0
x = torch::ones({2, 2}, torch::requires_grad());
// 建構時傳入torch::requires_grad(),張量的requires_grad()便會為true
std::cout << x.requires_grad() << std::endl; // 1
torch::Tensor y = x.mean();
std::cout << y << std::endl;
std::cout << y.requires_grad() << std::endl; // 1
// 對於非葉子節點,必須事先調用retain_grad(),這樣它在反向傳播時的梯度才會被保留
y.retain_grad(); // retain grad for non-leaf Tensor
y.backward();
std::cout << y.grad() << std::endl;
std::cout << x.grad() << std::endl;
// at命名空間
at::Tensor x1 = at::ones({2, 2});
std::cout << x1.requires_grad() << std::endl; // 0
at::Tensor y1 = x1.mean();
std::cout << y1.requires_grad() << std::endl; // 0
// y1.retain_grad(); // core dumped
// at::Tensor透過set_requires_grad後就可以被微分了
x1.set_requires_grad(true);
std::cout << "after set requires grad: " << x1.requires_grad() << std::endl; // 1
std::cout << y1.requires_grad() << std::endl; // 0
// x1改變了之後y1也必須更新
y1 = x1.mean();
std::cout << y1.requires_grad() << std::endl; // 1
y1.retain_grad(); // retain grad for non-leaf Tensor
y1.backward();
std::cout << y1.grad() << std::endl;
std::cout << x1.grad() << std::endl;
return 0;
}
編寫以下CMakeLists.txt
:
cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
project(autograd)
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
add_executable(autograd autograd.cpp)
target_link_libraries(autograd "${TORCH_LIBRARIES}")
set_property(TARGET autograd PROPERTY CXX_STANDARD 17)
# The following code block is suggested to be used on Windows.
# According to https://github.com/pytorch/pytorch/issues/25457,
# the DLLs need to be copied to avoid memory errors.
if (MSVC)
file(GLOB TORCH_DLLS "${TORCH_INSTALL_PREFIX}/lib/*.dll")
add_custom_command(TARGET autograd
POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different
${TORCH_DLLS}
$<TARGET_FILE_DIR:autograd>)
endif (MSVC)
編譯執行:
rm -rf * && cmake -DCMAKE_PREFIX_PATH=/root/Documents/installation/libtorch .. && make && ./autograd
逐行分析如下。
torch命名空間
使用torch
命名空間的factory function創造torch::Tensor
:
torch::Tensor x = torch::ones({2, 2});
std::cout << x << std::endl;
結果如下:
1 1
1 1
[ CPUFloatType{2,2} ]
此處沒傳入torch::requires_grad()
,所以張量的 requires_grad()
會為false:
std::cout << x.requires_grad() << std::endl; // 0
如果建構時傳入torch::requires_grad()
,張量的requires_grad()
便會為true:
x = torch::ones({2, 2}, torch::requires_grad());
std::cout << x.requires_grad() << std::endl; // 1
torch::Tensor y = x.mean();
std::cout << y << std::endl;
1
[ CPUFloatType{} ]
std::cout << y.requires_grad() << std::endl; // 1
對於非葉子節點,必須事先調用retain_grad()
,這樣它在反向傳播時的梯度才會被保留:
y.retain_grad(); // retain grad for non-leaf Tensor
y.backward();
std::cout << y.grad() << std::endl;
1
[ CPUFloatType{} ]
如果前面沒有y.retain_grad()
直接調用y.grad()
,將會導致core dumped:
[W TensorBody.h:489] Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-lea
f Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (function grad) [ Tensor (undefined) ]
terminate called after throwing an instance of 'c10::Error' what(): Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if
you need to access saved tensors after calling backward.
Exception raised from unpack at ../torch/csrc/autograd/saved_variable.cpp:136 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7faba17f4d47 in /root/Documents/installation/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x68 (0x7faba17ae0fc in /root/Documents/installation/libtorch/lib
/libc10.so)
frame #2: torch::autograd::SavedVariable::unpack(std::shared_ptr<torch::autograd::Node>) const + 0x13b2 (0x7fab8f87d6c2 in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
frame #3: torch::autograd::generated::MeanBackward0::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x98 (0x7fab8eb73998 in /root/Documents/i
nstallation/libtorch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x4d068cb (0x7fab8f8428cb in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
frame #5: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std:
:shared_ptr<torch::autograd::ReadyQueue> const&) + 0xe8d (0x7fab8f83b94d in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
frame #6: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x698 (0x7fab8f83cca8 in /root/Documents/installation/lib
torch/lib/libtorch_cpu.so)
frame #7: torch::autograd::Engine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>, torch::
autograd::InputBuffer&&) + 0x3dd (0x7fab8f8378bd in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
frame #8: torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::all
ocator<at::Tensor> > const&, bool, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0xa26 (0x7fab8f83a546 in /
root/Documents/installation/libtorch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x4ce0e81 (0x7fab8f81ce81 in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
frame #10: torch::autograd::backward(std::vector<at::Tensor, std::allocator<at::Tensor> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&,
c10::optional<bool>, bool, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) + 0x5c (0x7fab8f81f88c in /root/Documents/installation/libtorch/lib/l
ibtorch_cpu.so)
frame #11: <unknown function> + 0x4d447de (0x7fab8f8807de in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
frame #12: at::Tensor::_backward(c10::ArrayRef<at::Tensor>, c10::optional<at::Tensor> const&, c10::optional<bool>, bool) const + 0x48 (0x7fab8c51b208 in /root
/Documents/installation/libtorch/lib/libtorch_cpu.so)
frame #13: <unknown function> + 0x798a (0x5638af5ed98a in ./autograd)
frame #14: <unknown function> + 0x4d55 (0x5638af5ead55 in ./autograd)
frame #15: <unknown function> + 0x29d90 (0x7fab8a6e9d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #16: __libc_start_main + 0x80 (0x7fab8a6e9e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #17: <unknown function> + 0x4985 (0x5638af5ea985 in ./autograd)
繼續看x
的梯度:
std::cout << x.grad() << std::endl;
0.2500 0.2500
0.2500 0.2500
[ CPUFloatType{2,2} ]
at命名空間
改用at
命名空間下的factory function創建張量:
// at命名空間
at::Tensor x1 = at::ones({2, 2});
std::cout << x1.requires_grad() << std::endl; // 0
如果我們使用跟torch::ones
類似的方式在at::ones
裡加入torch::requires_grad()
參數會如何呢?結果x1.requires_grad()
仍然會是0。回顧at::rand_symint,我們可以猜想這是因為在進一步調用底層函數時只關注options.dtype_opt
,options.layout_opt
,options.device_opt
和options.pinned_memory_opt
等四個選項,而忽略options.requires_grad
:
at::_ops::rand::call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
定義y1
變數,一開始其requires_grad
為false:
at::Tensor y1 = x1.mean();
std::cout << y1.requires_grad() << std::endl; // 0
因為此時x1
, y1
都是不可微的,如果嘗試調用y1.retain_grad()
將會導致core dumped:
terminate called after throwing an instance of 'c10::Error'
what(): can't retain_grad on Tensor that has requires_grad=False
Exception raised from retain_grad at ../torch/csrc/autograd/variable.cpp:503 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f7401f62d47 in /root/Documents/installation/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x68 (0x7f7401f1c0fc in /root/Documents/installation/libtorch/lib/libc10.so)
frame #2: <unknown function> + 0x4d4751f (0x7f73efff151f in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x4cef (0x560b61ca7cef in ./autograd)
frame #4: <unknown function> + 0x29d90 (0x7f73eae57d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #5: __libc_start_main + 0x80 (0x7f73eae57e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x4965 (0x560b61ca7965 in ./autograd)
Aborted (core dumped)
如果想要讓它們變成可微的呢?我們可以透過set_requires_grad
函數:
x1.set_requires_grad(true);
std::cout << "after set requires grad: " << x1.requires_grad() << std::endl; // 1
std::cout << y1.requires_grad() << std::endl; // 0
可以看到這時候y1
的requires_grad
為false,這是因為x1
改變了之後y1
尚未更新。
透過以下方式更新後y1
後,其requires_grad
也會變為true:
y1 = x1.mean();
std::cout << y1.requires_grad() << std::endl; // 1
y1.retain_grad();
的作用是保留非葉子張量的梯度:
y1.retain_grad(); // retain grad for non-leaf Tensor
調用該函數的前提是該張量的requires_grad
必須為true,如果省略y1 = x1.mean();
這一行,因為y1
的requires_grad
為false,所以在y1.retain_grad();
時會出現如下錯誤:
terminate called after throwing an instance of 'c10::Error'
what(): can't retain_grad on Tensor that has requires_grad=False
Exception raised from retain_grad at ../torch/csrc/autograd/variable.cpp:503 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fafd2dfcd47 in /root/Documents/installation/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x68 (0x7fafd2db60fc in /root/Documents/installation/libtorch/lib/libc10.so)
frame #2: <unknown function> + 0x4d4751f (0x7fafc0e8b51f in /root/Documents/installation/libtorch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x4f77 (0x55f9a73dff77 in ./autograd)
frame #4: <unknown function> + 0x29d90 (0x7fafbbcf1d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #5: __libc_start_main + 0x80 (0x7fafbbcf1e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x4985 (0x55f9a73df985 in ./autograd)
Aborted (core dumped)
開始反向傳播,然後查看y1
的梯度:
y1.backward();
std::cout << y1.grad() << std::endl;
1
[ CPUFloatType{} ]
如果注釋掉y1.retain_grad();
,則y1
的梯度不會被保留,只會輸出一個未定義的張量,並出現以下警告:
[W TensorBody.h:489] Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (function grad)
[ Tensor (undefined) ]
查看x1
的梯度:
std::cout << x1.grad() << std::endl;
結果與使用torch::
時相同:
0.2500 0.2500
0.2500 0.2500
[ CPUFloatType{2,2} ]