梯度计算中常用的矩阵微积分公式
标量对向量求导的常用数学公式
设标量函数 y = f ( x ) y = f(\boldsymbol{x}) y=f(x),其中 x = ( x 1 , x 2 , ⋯ , x n ) T \boldsymbol{x} = (x_1, x_2, \cdots, x_n)^{\rm T} x=(x1,x2,⋯,xn)T 是一个 n n n 维列向量。则标量 y y y 对向量 x \boldsymbol{x} x 的导数为一个 n n n 维列向量:
∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] \frac{\partial y}{\partial \boldsymbol{x}} = \begin{bmatrix} \dfrac{\partial y}{\partial x_1} \\ \dfrac{\partial y}{\partial x_2} \\ \vdots \\ \dfrac{\partial y}{\partial x_n} \end{bmatrix} ∂x∂y= ∂x1∂y∂x2∂y⋮∂xn∂y
- 线性函数:若 y = a T x y = \boldsymbol{a}^{\rm T} \boldsymbol{x} y=aTx,其中 a \boldsymbol{a} a 是一个 n n n 维列向量,则
∂ y ∂ x = a \frac{\partial y}{\partial \boldsymbol{x}} = \boldsymbol{a} ∂x∂y=a
- 二次型函数:若 y = x T A x y = \boldsymbol{x}^{\rm T} {\bm A} \boldsymbol{x} y=xTAx,其中 A {\bm A} A 是一个 n × n n \times n n×n 的矩阵,则
∂ y ∂ x = ( A + A T ) x \frac{\partial y}{\partial \boldsymbol{x}} = ({\bm A} + {\bm A}^{\rm T}) \boldsymbol{x} ∂x∂y=(A+AT)x
当 A {\bm A} A 为对称矩阵时, A T = A {\bm A}^{\rm T} = {\bm A} AT=A,则
∂
y
∂
x
=
2
A
x
\frac{\partial y}{\partial \boldsymbol{x}} = 2{\bm A} \boldsymbol{x}
∂x∂y=2Ax
当
A
{\bm A}
A 为单位矩阵时,
y
=
x
T
x
y = \boldsymbol{x}^{\rm T} \boldsymbol{x}
y=xTx,则
∂
y
∂
x
=
∂
∥
x
∥
2
∂
x
=
∂
x
T
x
∂
x
=
2
x
\frac{\partial y}{\partial \boldsymbol{x}} = \frac{\partial \|{\bm x}\|^2}{\partial {\bm x}} = \frac{\partial {\bm x}^{\rm T} {\bm x}}{\partial {\bm x}} =2{\bm x}
∂x∂y=∂x∂∥x∥2=∂x∂xTx=2x
∥
x
∥
2
\|{\bm x}\|^2
∥x∥2表示向量
x
{\bm x}
x的范数(长度)的平方。
向量对向量求导的常用数学公式
若
y
=
A
x
{\bm y}= {\bm A} \boldsymbol{x}
y=Ax,其中
A
{\bm A}
A 是一个
n
×
n
n \times n
n×n 的矩阵,则
∂
y
∂
x
=
∂
A
x
∂
x
=
A
T
\frac{\partial {\bm y}}{\partial \boldsymbol{x}} = \frac{\partial {\bm A}{\bm x}}{\partial {\bm x}} = {\bm A}^{\rm T}
∂x∂y=∂x∂Ax=AT
A
{\bm A}
A是一个矩阵,
x
{\bm x}
x是一个向量。
对
x
{\bm x}
x求导的结果是矩阵
A
{\bm A}
A的转置
A
T
{\bm A}^{\rm T}
AT。
复合函数的导数
给定函数 g ( u ( x ) ) g(u(x)) g(u(x)),其中 u = u ( x ) = b − A x {\bm u}=u({\bm x}) = {\bm b} - {\bm A}{\bm x} u=u(x)=b−Ax,且 g ( u ) = ∥ u ∥ 2 g({\bm u}) = \|{\bm u}\|^2 g(u)=∥u∥2。
链式法则
根据链式法则(Chain Rule),有:
∂
g
(
u
(
x
)
)
∂
x
=
∂
g
∂
u
⋅
∂
u
∂
x
\frac{\partial g(u({\bm x}))}{\partial {\bm x}} = \frac{\partial g}{\partial {\bm u}} \cdot \frac{\partial {\bm u}}{\partial {\bm x}}
∂x∂g(u(x))=∂u∂g⋅∂x∂u
具体步骤
-
计算 ∂ u ∂ x \dfrac{\partial {\bm u}}{\partial {\bm x}} ∂x∂u:
u ( x ) = b − A x {\bm u}({\bm x}) = {\bm b} - {\bm A}{\bm x} u(x)=b−Ax
对 x {\bm x} x 求导得到:
∂ u ∂ x = − A \frac{\partial {\bm u}}{\partial {\bm x}} = -{\bm A} ∂x∂u=−A -
计算 ∂ g ( u ) ∂ u \dfrac{\partial g({\bm u})}{\partial {\bm u}} ∂u∂g(u):
g ( u ) = ∥ u ∥ 2 = u T u g({\bm u}) = \|{\bm u}\|^2 = {\bm u}^{\rm T} {\bm u} g(u)=∥u∥2=uTu
对 u {\bm u} u 求导得到:
∂ g ( u ) ∂ u = 2 u \frac{\partial g({\bm u})}{\partial {\bm u}} = 2{\bm u} ∂u∂g(u)=2u -
应用链式法则:
∂ g ( u ( x ) ) ∂ x = ∂ g ( u ) ∂ u ⋅ ∂ u ∂ x \frac{\partial g(u({\bm x}))}{\partial {\bm x}} = \frac{\partial g({\bm u})}{\partial {\bm u}} \cdot \frac{\partial {\bm u}}{\partial {\bm x}} ∂x∂g(u(x))=∂u∂g(u)⋅∂x∂u
将上面的结果代入:
∂ g ( u ( x ) ) ∂ x = 2 u ⋅ ( − A ) \frac{\partial g({\bm u}({\bm x}))}{\partial {\bm x}} = 2{\bm u} \cdot (-{\bm A}) ∂x∂g(u(x))=2u⋅(−A)
由于 u = b − A x {\bm u} = {\bm b} - {\bm A}{\bm x} u=b−Ax,代入得到:
∂ g ( u ( x ) ) ∂ x = − 2 A T ( b − A x ) \frac{\partial g({u}({\bm x}))}{\partial {\bm x}} = -2{\bm A}^{\rm T} ({\bm b} - {\bm A}{\bm x}) ∂x∂g(u(x))=−2AT(b−Ax)
最终结果是:
∂
∥
b
−
A
x
∥
2
∂
x
=
−
2
A
T
(
b
−
A
x
)
\frac{\partial \|{\bm b} - {\bm A}{\bm x}\|^2}{\partial {\bm x}} = -2{\bm A}^{\rm T} ({\bm b} - {\bm A}{\bm x})
∂x∂∥b−Ax∥2=−2AT(b−Ax)