Torch 自动求导
文章目录
- Torch 自动求导
- 一、链式求导法则
- 1.1 原理
- 1.2 示例
- 二、传统求导
- 三、自动求导
- 3.1 原理
- 3.2 代码实现
Torch 自动求导
一、链式求导法则
1.1 原理
标量链式法则:
y = f ( u ) , u = g ( x ) , ∂ y ∂ x = ∂ y ∂ u ∂ u ∂ x y=f(u),u=g(x),\frac{\partial y}{\partial x} =\frac{\partial y}{\partial u}\frac{\partial u}{\partial x} y=f(u),u=g(x),∂x∂y=∂u∂y∂x∂u
扩展到向量:
∂ y ∂ x ⃗ 1 × n = ∂ y ∂ u ∂ u ∂ x ⃗ m × n ∂ y ∂ x ⃗ 1 × n = ∂ y ∂ u ⃗ 1 × m ∂ u ⃗ ∂ x ⃗ m × n ∂ y ⃗ ∂ x ⃗ m × n = ∂ y ⃗ ∂ u ⃗ m × k ∂ u ⃗ ∂ x ⃗ k × n \begin{array}{l} \frac{\partial y}{\partial \vec{x}}_{1\times n} =\frac{\partial y}{\partial u}\frac{\partial u}{\partial \vec{x}} _{m\times n} \\ \frac{\partial y}{\partial \vec{x}}_{1\times n} =\frac{\partial y}{\partial \vec{u}} _{1\times m}\frac{\partial \vec{u}}{\partial \vec{x}} _{m\times n} \\ \frac{\partial \vec{y}}{\partial \vec{x}}_{m\times n} =\frac{\partial \vec{y}}{\partial \vec{u}} _{m\times k}\frac{\partial \vec{u}}{\partial \vec{x}} _{k\times n} \\ \end{array} ∂x∂y1×n=∂u∂y∂x∂um×n∂x∂y1×n=∂u∂y1×m∂x∂um×n∂x∂ym×n=∂u∂ym×k∂x∂uk×n
1.2 示例
如计算:
x
⃗
,
w
⃗
∈
R
n
,
y
∈
R
z
=
(
<
x
⃗
,
w
⃗
>
−
y
)
2
\begin{array}{l} \vec{x},\vec{w}\in \mathbb{R}^{n},y\in \mathbb{R} \\ z=(<\vec{x},\vec{w}>-y)^{2} \\ \end{array}
x,w∈Rn,y∈Rz=(<x,w>−y)2
我们要计算:
∂
z
∂
w
⃗
\frac{\partial z}{\partial \vec{w}}
∂w∂z
即:先进行换元
a
=
<
x
⃗
,
w
⃗
>
b
=
a
−
y
z
=
b
2
\begin{array}{l} \\ a=<\vec{x},\vec{w}> \\ b=a-y \\ z=b^{2} \end{array}
a=<x,w>b=a−yz=b2
再进行链式求导:
∂ z ∂ w ⃗ = ∂ z ∂ b ∂ b ∂ a ∂ a ∂ w ⃗ = ∂ b 2 ∂ b ∂ ( a − y ) ∂ a ∂ < x ⃗ , w ⃗ > ∂ w ⃗ = 2 b ⋅ 1 ⋅ x T = 2 ( < x ⃗ , w ⃗ > − y ) x ⃗ T \begin{array}{l} \frac{\partial z}{\partial \vec{w}} &=\frac{\partial z}{\partial b} \frac{\partial b}{\partial a}\frac{\partial a}{\partial \vec{w}} \\ &=\frac{\partial b^{2}}{\partial b}\frac{\partial (a-y)}{\partial a}\frac{\partial <\vec{x},\vec{w}>}{\partial \vec{w}} \\ &=2b \cdot 1 \cdot x^{T} \\ &=2(<\vec{x},\vec{w}>-y)\vec{x}^{T} \end{array} ∂w∂z=∂b∂z∂a∂b∂w∂a=∂b∂b2∂a∂(a−y)∂w∂<x,w>=2b⋅1⋅xT=2(<x,w>−y)xT
二、传统求导
自动求导计算一个函数在指定值上的导数
符号求导,以及使用数值求导 ∂ f ( x ) ∂ x = lim h → 0 f ( x + h ) − f ( x ) h \frac{\partial f(x)}{\partial x} =\lim_{ h \to 0 } \frac{f(x+h)-f(x)}{h} ∂x∂f(x)=limh→0hf(x+h)−f(x):
sympy库
from sympy import symbols, diff
x = symbols("x")
f = x**2 + 3*x + 2
df_dx = diff(f, x)
print(df_dx)
print("f在x=1处的导数值为:", df_dx.subs(x, 1))
scipy 库
from scipy.differentiate import derivative
# 定义函数
def f(x):
return x**2 + 3*x + 2
# 计算函数在某点的导数
x = 1
df_dx = derivative(f, x)
print("f(x)在x=1处的导数为:", df_dx["df"])
print(df_dx)
使用极限的方法计算导数:
# 定义函数
def f(x):
return x**2 + 3*x + 2
# 计算函数在某点的导数
x = 1
h = 1e-6
df_dx = (f(x + h) - f(x)) / h
print(df_dx)
三、自动求导
3.1 原理
将代码分解成操作子
将计算表示成一个无环图
自动求导的两种模式:
- 链式法则: ∂ y ∂ x = ∂ y ∂ u n ∂ u n ∂ u n − 1 ⋯ ∂ u 2 ∂ u 1 ∂ u 1 ∂ x \frac{\partial y}{\partial x} =\frac{\partial y}{\partial u_{n}}\frac{\partial u_{n}}{\partial u_{n-1}}\cdots\frac{\partial u_{2}}{\partial u_{1}}\frac{\partial u_{1}}{\partial x} ∂x∂y=∂un∂y∂un−1∂un⋯∂u1∂u2∂x∂u1
- 正向积累: ∂ y ∂ x = ∂ y ∂ u n ( ∂ u n ∂ u n − 1 ( ⋯ ( ∂ u 2 ∂ u 1 ∂ u 1 ∂ x ) ) ) \frac{\partial y}{\partial x} =\frac{\partial y}{\partial u_{n}}(\frac{\partial u_{n}}{\partial u_{n-1}}(\cdots(\frac{\partial u_{2}}{\partial u_{1}}\frac{\partial u_{1}}{\partial x}))) ∂x∂y=∂un∂y(∂un−1∂un(⋯(∂u1∂u2∂x∂u1)))
- 反向积累: ∂ y ∂ x = ( ( ( ∂ y ∂ u n ∂ u n ∂ u n − 1 ) ⋯ ) ∂ u 2 ∂ u 1 ) ∂ u 1 ∂ x \frac{\partial y}{\partial x} =(((\frac{\partial y}{\partial u_{n}}\frac{\partial u_{n}}{\partial u_{n-1}})\cdots)\frac{\partial u_{2}}{\partial u_{1}})\frac{\partial u_{1}}{\partial x} ∂x∂y=(((∂un∂y∂un−1∂un)⋯)∂u1∂u2)∂x∂u1
3.2 代码实现
假设我们对函数 y = 2 X T X y=2X^{T}X y=2XTX进行关于 x x x的求导:
import torch
x = torch.arange(4.0, requires_grad=True) # 单独开辟一个空间进行梯度的存储
print("x的值为:", x.tolist())
y = 2 * x@x # 进行y的构建
y.backward() # 进行反向传播求导数 dy = 4 x dx
print("y的梯度为:", x.grad.tolist()) # x.grad 获得梯度
x.grad.zero_() # 在默认情况下,pytorch会累积梯度,我们需要清除之前的值,可以自己试一下有无这一行代码的效果
k = x@x
k.backward()
x.grad