雅克比矩阵学习笔记
前置
假设
f
:
R
n
→
R
m
f:R_n\to R_m
f:Rn→Rm是从
n
n
n维欧氏空间线性映射到到
m
m
m维欧氏空间的函数,这个函数由
m
m
m个实函数组成,记作:
{
y
1
=
f
1
(
x
1
,
x
2
,
.
.
.
,
x
n
)
y
2
=
f
2
(
x
1
,
x
2
,
.
.
.
,
x
n
)
.
.
.
y
m
=
f
n
(
x
1
,
x
2
,
.
.
.
,
x
n
)
\left\{ \begin{array}{lcl} y_1=f_1(x1,x2,...,x_n)\\ y_2=f_2(x1,x2,...,x_n)\\ ...\\ y_m=f_n(x1,x2,...,x_n)\\ \end{array} \right.
⎩
⎨
⎧y1=f1(x1,x2,...,xn)y2=f2(x1,x2,...,xn)...ym=fn(x1,x2,...,xn)
我们对
f
(
Z
)
,
Z
=
(
x
1
,
x
2
,
.
.
.
,
x
n
)
f(Z),Z=(x_1,x_2,...,x_n)
f(Z),Z=(x1,x2,...,xn)进行一阶泰勒展开,
f
(
Z
)
=
f
(
Z
0
)
+
J
f
(
Z
0
)
(
Z
−
Z
0
)
f(Z)=f(Z_0)+J_f(Z_0)(Z-Z_0)
f(Z)=f(Z0)+Jf(Z0)(Z−Z0)
其中
f
(
Z
0
)
f(Z_0)
f(Z0)就是
f
(
Z
)
f(Z)
f(Z)在
Z
0
Z_0
Z0的导数,在这里就是
f
(
Z
)
f(Z)
f(Z)的雅克比矩阵。
值得注意的是,由于只展开到一阶,因此存在误差,故这里的等号并不是严格意义上的相等。
定义
m
∗
n
m*n
m∗n的雅克比矩阵:
[
∂
f
1
∂
x
1
∂
f
1
∂
x
2
.
.
.
∂
f
1
∂
x
n
∂
f
2
∂
x
1
∂
f
2
∂
x
2
.
.
.
∂
f
2
∂
x
n
.
.
.
.
.
.
.
.
.
.
.
.
∂
f
m
∂
x
1
∂
f
m
∂
x
2
.
.
.
∂
f
m
∂
x
n
]
\left[ \begin{matrix} \frac{\partial f_1}{\partial x_1}&\frac{\partial f_1}{\partial x_2}&...&\frac{\partial f_1}{\partial x_n}\\ \frac{\partial f_2}{\partial x_1}&\frac{\partial f_2}{\partial x_2}&...&\frac{\partial f_2}{\partial x_n}\\ ...&...&...&...\\ \frac{\partial f_m}{\partial x_1}&\frac{\partial f_m}{\partial x_2}&...&\frac{\partial f_m}{\partial x_n}\\ \end{matrix} \right]
∂x1∂f1∂x1∂f2...∂x1∂fm∂x2∂f1∂x2∂f2...∂x2∂fm............∂xn∂f1∂xn∂f2...∂xn∂fm
当然也可以写成行矩阵的形式:
[
∂
f
∂
x
1
∂
f
∂
x
2
.
.
.
∂
f
∂
x
n
]
\left[ \begin{matrix} \frac{\partial f}{\partial x_1}&\frac{\partial f}{\partial x_2}&...&\frac{\partial f}{\partial x_n}\\ \end{matrix} \right]
[∂x1∂f∂x2∂f...∂xn∂f]
这也正是梯度矩阵的转置矩阵,即
J
f
(
Z
)
=
∇
f
(
Z
)
T
J_f(Z)=\nabla f(Z)^T
Jf(Z)=∇f(Z)T。
雅克比行列式
当
n
=
m
n=m
n=m时,雅克比矩阵就变为了:
[
∂
f
1
∂
x
1
∂
f
1
∂
x
2
.
.
.
∂
f
1
∂
x
n
∂
f
2
∂
x
1
∂
f
2
∂
x
2
.
.
.
∂
f
2
∂
x
n
.
.
.
.
.
.
.
.
.
.
.
.
∂
f
n
∂
x
1
∂
f
n
∂
x
2
.
.
.
∂
f
n
∂
x
n
]
\left[ \begin{matrix} \frac{\partial f_1}{\partial x_1}&\frac{\partial f_1}{\partial x_2}&...&\frac{\partial f_1}{\partial x_n}\\ \frac{\partial f_2}{\partial x_1}&\frac{\partial f_2}{\partial x_2}&...&\frac{\partial f_2}{\partial x_n}\\ ...&...&...&...\\ \frac{\partial f_n}{\partial x_1}&\frac{\partial f_n}{\partial x_2}&...&\frac{\partial f_n}{\partial x_n}\\ \end{matrix} \right]
∂x1∂f1∂x1∂f2...∂x1∂fn∂x2∂f1∂x2∂f2...∂x2∂fn............∂xn∂f1∂xn∂f2...∂xn∂fn
对上面的展开式进行移项,得:
f
(
Z
)
−
f
(
Z
0
)
=
J
f
(
Z
0
)
(
Z
−
Z
0
)
f(Z)-f(Z_0)=J_f(Z_0)(Z-Z_0)
f(Z)−f(Z0)=Jf(Z0)(Z−Z0)
记
Z
−
Z
0
=
Δ
x
,
f
(
Z
)
−
f
(
Z
0
)
=
Δ
y
Z-Z_0=\Delta x,f(Z)-f(Z_0)=\Delta y
Z−Z0=Δx,f(Z)−f(Z0)=Δy,于是有:
Δ
y
=
J
f
(
Z
0
)
Δ
x
\Delta y=J_f(Z_0)\Delta x
Δy=Jf(Z0)Δx
展开,有:
[
d
y
1
d
y
2
.
.
.
d
y
n
]
=
[
∂
f
1
∂
x
1
∂
f
1
∂
x
2
.
.
.
∂
f
1
∂
x
n
∂
f
2
∂
x
1
∂
f
2
∂
x
2
.
.
.
∂
f
2
∂
x
n
.
.
.
.
.
.
.
.
.
.
.
.
∂
f
n
∂
x
1
∂
f
n
∂
x
2
.
.
.
∂
f
n
∂
x
n
]
∗
[
d
x
1
d
x
2
.
.
.
d
x
n
]
\left[ \begin{matrix} \mathrm{d}y_1\\ \mathrm{d}y_2\\ ...\\ \mathrm{d}y_n\\ \end{matrix} \right]= \left[ \begin{matrix} \frac{\partial f_1}{\partial x_1}&\frac{\partial f_1}{\partial x_2}&...&\frac{\partial f_1}{\partial x_n}\\ \frac{\partial f_2}{\partial x_1}&\frac{\partial f_2}{\partial x_2}&...&\frac{\partial f_2}{\partial x_n}\\ ...&...&...&...\\ \frac{\partial f_n}{\partial x_1}&\frac{\partial f_n}{\partial x_2}&...&\frac{\partial f_n}{\partial x_n}\\ \end{matrix} \right]* \left[ \begin{matrix} \mathrm{d}x_1\\ \mathrm{d}x_2\\ ...\\ \mathrm{d}x_n\\ \end{matrix} \right]
dy1dy2...dyn
=
∂x1∂f1∂x1∂f2...∂x1∂fn∂x2∂f1∂x2∂f2...∂x2∂fn............∂xn∂f1∂xn∂f2...∂xn∂fn
∗
dx1dx2...dxn
继续展开,有:
[
d
y
1
d
y
2
.
.
.
d
y
n
]
=
[
∂
f
1
∂
x
1
d
x
1
+
∂
f
1
∂
x
2
d
x
2
+
.
.
.
+
∂
f
1
∂
x
n
d
x
n
∂
f
2
∂
x
1
d
x
1
+
∂
f
2
∂
x
2
d
x
2
+
.
.
.
+
∂
f
2
∂
x
n
d
x
n
.
.
.
∂
f
n
∂
x
1
d
x
1
+
∂
f
n
∂
x
2
d
x
2
+
.
.
.
+
∂
f
n
∂
x
n
d
x
n
]
\left[ \begin{matrix} \mathrm{d}y_1\\ \mathrm{d}y_2\\ ...\\ \mathrm{d}y_n\\ \end{matrix} \right]= \left[ \begin{matrix} \frac{\partial f_1}{\partial x_1}\mathrm{d}x_1+\frac{\partial f_1}{\partial x_2}\mathrm{d}x_2+...+\frac{\partial f_1}{\partial x_n}\mathrm{d}x_n\\ \frac{\partial f_2}{\partial x_1}\mathrm{d}x_1+\frac{\partial f_2}{\partial x_2}\mathrm{d}x_2+...+\frac{\partial f_2}{\partial x_n}\mathrm{d}x_n\\ ...\\ \frac{\partial f_n}{\partial x_1}\mathrm{d}x_1+\frac{\partial f_n}{\partial x_2}\mathrm{d}x_2+...+\frac{\partial f_n}{\partial x_n}\mathrm{d}x_n \end{matrix} \right]
dy1dy2...dyn
=
∂x1∂f1dx1+∂x2∂f1dx2+...+∂xn∂f1dxn∂x1∂f2dx1+∂x2∂f2dx2+...+∂xn∂f2dxn...∂x1∂fndx1+∂x2∂fndx2+...+∂xn∂fndxn
正交化一下,有:
[
d
y
1
0
.
.
.
0
0
d
y
2
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
d
y
n
]
=
[
∂
f
1
∂
x
1
d
x
1
∂
f
1
∂
x
2
d
x
2
.
.
.
∂
f
1
∂
x
n
d
x
n
∂
f
2
∂
x
1
d
x
1
∂
f
2
∂
x
2
d
x
2
.
.
.
∂
f
2
∂
x
n
d
x
n
.
.
.
.
.
.
.
.
.
.
.
.
∂
f
n
∂
x
1
d
x
1
∂
f
n
∂
x
2
d
x
2
.
.
.
∂
f
n
∂
x
n
d
x
n
]
\left[ \begin{matrix} \mathrm{d}y_1&0&...&0\\ 0&\mathrm{d}y_2&...&0\\ ...&...&...&...\\ 0&0&...&\mathrm{d}y_n\\ \end{matrix} \right]= \left[ \begin{matrix} \frac{\partial f_1}{\partial x_1}\mathrm{d}x_1&\frac{\partial f_1}{\partial x_2}\mathrm{d}x_2&...&\frac{\partial f_1}{\partial x_n}\mathrm{d}x_n\\ \frac{\partial f_2}{\partial x_1}\mathrm{d}x_1&\frac{\partial f_2}{\partial x_2}\mathrm{d}x_2&...&\frac{\partial f_2}{\partial x_n}\mathrm{d}x_n\\ ...&...&...&...\\ \frac{\partial f_n}{\partial x_1}\mathrm{d}x_1&\frac{\partial f_n}{\partial x_2}\mathrm{d}x_2&...&\frac{\partial f_n}{\partial x_n}\mathrm{d}x_n \end{matrix} \right]
dy10...00dy2...0............00...dyn
=
∂x1∂f1dx1∂x1∂f2dx1...∂x1∂fndx1∂x2∂f1dx2∂x2∂f2dx2...∂x2∂fndx2............∂xn∂f1dxn∂xn∂f2dxn...∂xn∂fndxn
取两边的行列式(特别注意的是,由于
{
d
x
}
,
{
d
y
}
\{\mathrm{d}x\},\{\mathrm{d}y\}
{dx},{dy}均为正数,因此行列式需取绝对值),有:
d
y
1
⋅
d
y
2
⋅
.
.
.
⋅
d
y
n
=
∣
∣
∂
f
1
∂
x
1
∂
f
1
∂
x
2
.
.
.
∂
f
1
∂
x
n
∂
f
2
∂
x
1
∂
f
2
∂
x
2
.
.
.
∂
f
2
∂
x
n
.
.
.
.
.
.
.
.
.
.
.
.
∂
f
n
∂
x
1
∂
f
n
∂
x
2
.
.
.
∂
f
n
∂
x
n
∣
∣
⋅
d
x
1
⋅
d
x
2
⋅
.
.
.
⋅
d
x
n
\mathrm{d}y_1\cdot \mathrm{d}y_2\cdot...\cdot\mathrm{d}y_n=\left| \begin{vmatrix} \frac{\partial f_1}{\partial x_1}&\frac{\partial f_1}{\partial x_2}&...&\frac{\partial f_1}{\partial x_n}\\ \frac{\partial f_2}{\partial x_1}&\frac{\partial f_2}{\partial x_2}&...&\frac{\partial f_2}{\partial x_n}\\ ...&...&...&...\\ \frac{\partial f_n}{\partial x_1}&\frac{\partial f_n}{\partial x_2}&...&\frac{\partial f_n}{\partial x_n}\\ \end{vmatrix} \right|\cdot\mathrm{d}x_1\cdot\mathrm{d}x_2\cdot...\cdot\mathrm{d}x_n
dy1⋅dy2⋅...⋅dyn=
∂x1∂f1∂x1∂f2...∂x1∂fn∂x2∂f1∂x2∂f2...∂x2∂fn............∂xn∂f1∂xn∂f2...∂xn∂fn
⋅dx1⋅dx2⋅...⋅dxn
即:
d
y
1
⋅
d
y
2
⋅
.
.
.
⋅
d
y
n
=
∣
∣
J
f
(
Z
)
∣
∣
⋅
d
x
1
⋅
d
x
2
⋅
.
.
.
⋅
d
x
n
\mathrm{d}y_1\cdot \mathrm{d}y_2\cdot...\cdot\mathrm{d}y_n=||J_f(Z)||\cdot\mathrm{d}x_1\cdot\mathrm{d}x_2\cdot...\cdot\mathrm{d}x_n
dy1⋅dy2⋅...⋅dyn=∣∣Jf(Z)∣∣⋅dx1⋅dx2⋅...⋅dxn
上式在
n
=
1
n=1
n=1时,也即在一元函数中,可以理解为原线段的长度经过
∣
∣
J
f
(
Z
)
∣
∣
||J_f(Z)||
∣∣Jf(Z)∣∣的缩放得到新线段的长度;
上式在
n
=
2
n=2
n=2时,也即在二元函数中,可以理解为原平面图形的面积经过
∣
∣
J
f
(
Z
)
∣
∣
||J_f(Z)||
∣∣Jf(Z)∣∣的缩放得到新平面图形的面积;
上式在
n
=
3
n=3
n=3时,也即在三元函数中,可以理解为原平面图形的体积经过
∣
∣
J
f
(
Z
)
∣
∣
||J_f(Z)||
∣∣Jf(Z)∣∣的缩放得到新平面图形的体积;
在
n
>
3
n>3
n>3时直观上不好描述其几何意义,姑且不做讨论。
换言之,雅可比矩阵的行列式可以理解为原几何图形所确定的某种几何关系经过线性变化得到新几何图形的一种缩放比例。这也正是仿射变换。
应用
通过仿射变换解决圆锥曲线中一些问题(高中数学常用);
和黑塞矩阵一起作为各种牛顿法的基础,也是梯度下降等算法的基础(稍后会写);
和机器人以及运动学有关(大雾~~)。