【课堂笔记】线性回归梯度下降的矩阵求导推导
参考文章
参考文章
参考文章
线性回归
给定数据集
D
=
{
X
,
y
}
\mathcal{D}=\set{X, y}
D={X,y},可学习参数
w
∈
R
D
w \in \mathbb{R}^D
w∈RD
y
=
(
y
1
y
2
⋮
y
N
)
∈
R
N
y = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_N \end{pmatrix} \in \mathbb{R}^N
y=
y1y2⋮yN
∈RN,
X
=
(
x
11
x
12
⋯
x
1
D
x
21
x
22
⋯
x
2
D
⋮
⋮
⋱
⋮
x
N
1
x
N
2
⋯
x
N
D
)
∈
R
N
×
D
X = \begin{pmatrix} x_{11} & x_{12} & \cdots & x_{1D}\\ x_{21} & x_{22} & \cdots & x_{2D}\\ \vdots & \vdots & \ddots & \vdots\\ x_{N1} & x_{N2} & \cdots & x_{ND} \end{pmatrix} \in \mathbb{R}^{N \times D}
X=
x11x21⋮xN1x12x22⋮xN2⋯⋯⋱⋯x1Dx2D⋮xND
∈RN×D
定义损失向量
e
=
y
−
X
w
=
(
e
1
e
2
⋮
e
N
)
∈
R
N
e = y - Xw = \begin{pmatrix} e_1 \\ e_2 \\ \vdots \\ e_N \end{pmatrix} \in \mathbb{R}^N
e=y−Xw=
e1e2⋮eN
∈RN,其中
e
i
=
y
i
−
x
i
T
w
e_i = y_i - x_i^Tw
ei=yi−xiTw
则
M
S
E
MSE
MSE为
L
(
w
)
=
1
2
N
∑
N
n
=
1
(
y
n
−
x
n
T
w
)
2
=
1
2
N
e
T
e
\mathcal{L}(w) = \frac{1}{2N}\underset{n=1}{\overset{N}{\sum}}(y_n - x_n^Tw)^2=\frac{1}{2N}e^Te
L(w)=2N1n=1∑N(yn−xnTw)2=2N1eTe
然后计算
∂
L
(
w
)
∂
w
\frac{\partial \mathcal{L}(w)}{\partial w}
∂w∂L(w):
L ( w ) = 1 2 N e T e = 1 2 N ( y − X w ) T ( y − X w ) = 1 2 N ( y T − w T X T ) ( y − X w ) \mathcal{L}(w) = \frac{1}{2N}e^Te = \frac{1}{2N}(y - Xw)^T(y - Xw)=\frac{1}{2N}(y^T-w^TX^T)(y - Xw) L(w)=2N1eTe=2N1(y−Xw)T(y−Xw)=2N1(yT−wTXT)(y−Xw)
= 1 2 N ( y T y − y T X w − w T X T y + w T X T X w ) =\frac{1}{2N}(y^Ty-y^TXw-w^TX^Ty+w^TX^TXw) =2N1(yTy−yTXw−wTXTy+wTXTXw)
∂ y T X w ∂ w = X T y \frac{\partial y^TXw}{\partial w} = X^Ty ∂w∂yTXw=XTy
∂ w T X T y ∂ w = ∂ y T X w ∂ w = X T y \frac{\partial w^TX^Ty}{\partial w}=\frac{\partial y^TXw}{\partial w}=X^Ty ∂w∂wTXTy=∂w∂yTXw=XTy,这里是因为标量转置等于自己。
∂ w T X T X w ∂ w = 2 X T X w \frac{\partial w^TX^TXw}{\partial w}=2X^TXw ∂w∂wTXTXw=2XTXw
因此 ∇ L ( w ) = ∂ L ( w ) ∂ w = − 1 2 N ( 2 X T y − 2 X T X w ) = − 1 N X T e \nabla\mathcal{L}(w) = \frac{\partial \mathcal{L}(w)}{\partial w} = -\frac{1}{2N}(2X^Ty - 2X^TXw) = -\frac{1}{N}X^Te ∇L(w)=∂w∂L(w)=−2N1(2XTy−2XTXw)=−N1XTe