数值优化 | 图解梯度下降法与共轭梯度下降法(附案例分析与Python实现)
目录
- 0 专栏介绍
- 1 梯度下降法与轨迹优化
- 2 梯度下降算法原理
- 3 共轭梯度下降算法原理
- 4 案例分析与Python实现
0 专栏介绍
🔥课设、毕设、创新竞赛必备!🔥本专栏涉及更高阶的运动规划算法轨迹优化实战,包括:曲线生成、碰撞检测、安全走廊、优化建模(QP、SQP、NMPC、iLQR等)、轨迹优化(梯度法、曲线法等),每个算法都包含代码实现加深理解
🚀详情:运动规划实战进阶:轨迹优化篇
1 梯度下降法与轨迹优化
在复杂环境中,机器人需要寻找从起始点到目标点的最优路径。通过将路径表示为一个代价函数,梯度下降法可以用于最小化该代价,从而找到最符合期望的路径。例如在轨迹优化 | 基于ESDF的共轭梯度优化算法(附ROS C++/Python仿真)中,我们针对路径序列 X = { x i = ( x i , y i ) ∣ i ∈ [ 1 , N ] } \mathcal{X} =\left\{ \boldsymbol{x}_i=\left( x_i,y_i \right) |i\in \left[ 1,N \right] \right\} X={xi=(xi,yi)∣i∈[1,N]}设计了优化目标函数
f ( X ) = w o P o b s ( X ) + w κ P c u r ( X ) + w s P s m o ( X ) f\left( \mathcal{X} \right) =w_oP_{\mathrm{obs}}\left( \mathcal{X} \right) +w_{\kappa}P_{\mathrm{cur}}\left( \mathcal{X} \right) +w_sP_{\mathrm{smo}}\left( \mathcal{X} \right) f(X)=woPobs(X)+wκPcur(X)+wsPsmo(X)
最终实现了轨迹优化。此外,还有GRIP
、Constrianed Smoother
等算法都基于梯度下降算法。
然而,在实际操作过程中,很多同学并不知道梯度法的数值原理,导致调参没有头绪,本节就从数值优化的层面讲解梯度下降算法的基本方法
2 梯度下降算法原理
记第 k k k轮迭代后,自变量更新为 x k \boldsymbol{x}_k xk,令目标函数 f ( x ) f\left( \boldsymbol{x} \right) f(x)在 x = x k \boldsymbol{x}=\boldsymbol{x}_k x=xk一阶泰勒展开
F ( x ) = F ( x k ) + ∇ T F ( x k ) ( x − x k ) + O ( x ) F\left( \boldsymbol{x} \right) =F\left( \boldsymbol{x}_k \right) +\nabla ^TF\left( \boldsymbol{x}_k \right) \left( \boldsymbol{x}-\boldsymbol{x}_k \right) +O\left( \boldsymbol{x} \right) F(x)=F(xk)+∇TF(xk)(x−xk)+O(x)
期望下一次迭代满足
F ( x k + 1 ) = F ( x k ) + ∇ T F ( x k ) ( x k + 1 − x k ) < F ( x k ) F\left( \boldsymbol{x}_{k+1} \right) =F\left( \boldsymbol{x}_k \right) +\nabla ^TF\left( \boldsymbol{x}_k \right) \left( \boldsymbol{x}_{k+1}-\boldsymbol{x}_k \right) <F\left( \boldsymbol{x}_k \right) F(xk+1)=F(xk)+∇TF(xk)(xk+1−xk)<F(xk)
则
∇ T F ( x k ) ( x k + 1 − x k ) < 0 \nabla ^TF\left( \boldsymbol{x}_k \right) \left( \boldsymbol{x}_{k+1}-\boldsymbol{x}_k \right) <0 ∇TF(xk)(xk+1−xk)<0
取 x k + 1 − x k = − α ∇ F ( x k ) \boldsymbol{x}_{k+1}-\boldsymbol{x}_k=-\alpha \nabla F\left( \boldsymbol{x}_k \right) xk+1−xk=−α∇F(xk),则
F ( x k + 1 ) − F ( x k ) = − α ∇ T F ( x k ) ∇ F ( x k ) ⩽ 0 F\left( \boldsymbol{x}_{k+1} \right) -F\left( \boldsymbol{x}_k \right) =-\alpha \nabla ^TF\left( \boldsymbol{x}_k \right) \nabla F\left( \boldsymbol{x}_k \right) \leqslant 0 F(xk+1)−F(xk)=−α∇TF(xk)∇F(xk)⩽0
保证了函数值下降,这种迭代方式
x k + 1 = x k − α ∇ F ( x k ) \boldsymbol{x}_{k+1}=\boldsymbol{x}_k-\alpha \nabla F\left( \boldsymbol{x}_k \right) xk+1=xk−α∇F(xk)
称为梯度下降算法(Gradient Descent, GD)。其中 α \alpha α是优化步长或学习率,可取为常数或通过步长搜索计算。必须指出,泰勒公式成立的条件是 x → x k \boldsymbol{x}\rightarrow \boldsymbol{x}_k x→xk,故 ∇ F ( x k ) \nabla F\left( \boldsymbol{x}_k \right) ∇F(xk)不能太大,否则 x k + 1 \boldsymbol{x}_{k+1} xk+1与 x k \boldsymbol{x}_k xk距离太远产生余项误差。由于只利用了目标函数一阶梯度信息,所以梯度下降算法属于一阶无约束优化方法。由于梯度下降法每次都沿负梯度方向下降,所以可能锯齿状收敛且受初值影响大,如图所示;同时也无法避免局部最优陷落。
从收敛性角度,当目标函数满足Lipschitz连续且优化步长满足Wolfe准则时,梯度下降法可以保证收敛,收敛速度为线性。关于Wolfe准则可以参考数值优化 | 图解线搜索之Armijo、Goldstein、Wolfe准则(附案例分析与Python实现)
3 共轭梯度下降算法原理
共轭梯度法(Conjugate Gradient)是针对于 n n n维二次优化问题的梯度下降算法,给定
x ∗ = a r g min x 1 2 x T Q x + q T x \boldsymbol{x}^*=\mathrm{arg}\min _{\boldsymbol{x}}\frac{1}{2}\boldsymbol{x}^T\boldsymbol{Qx}+\boldsymbol{q}^T\boldsymbol{x} x∗=argxmin21xTQx+qTx
其中 Q \boldsymbol{Q} Q是 n n n维对称正定阵, q ∈ R n \boldsymbol{q}\in \mathbb{R} ^n q∈Rn,共轭梯度法既克服了梯度下降法收敛慢的缺点,又避免存储和计算牛顿类算法所需的二阶梯度信息。
共轭梯度法的核心原理是:求解矩阵 Q \boldsymbol{Q} Q的共轭向量组 d 0 , d 1 , ⋯ , d n \boldsymbol{d}_0,\boldsymbol{d}_1,\cdots ,\boldsymbol{d}_n d0,d1,⋯,dn作为 n n n个优化方向,由于优化方向间彼此正交,故每次迭代只需沿着一个方向 d i \boldsymbol{d}_i di寻优而互不影响。所以理论上最多 n n n次迭代就能找到最优解,收敛速度快,但实际应用中需要视具体情况确定阈值。这里给出一个具体的应用案例:轨迹优化 | 基于ESDF的共轭梯度优化算法(附ROS C++/Python仿真)
共轭梯度法迭代式为
x k + 1 = x k + α k d k \boldsymbol{x}_{k+1}=\boldsymbol{x}_k+\alpha _k\boldsymbol{d}_k xk+1=xk+αkdk
下面主要计算每次迭代的步长和优化方向。
-
优化方向 d k \boldsymbol{d}_k dk
设 d k = − ∇ F ( x k ) + γ k − 1 d k − 1 \boldsymbol{d}_k=-\nabla F\left( \boldsymbol{x}_k \right) +\gamma _{k-1}\boldsymbol{d}_{k-1} dk=−∇F(xk)+γk−1dk−1,等式两边同乘 d k − 1 T Q \boldsymbol{d}_{k-1}^{T}\boldsymbol{Q} dk−1TQ,则根据共轭关系有
− d k − 1 T Q ∇ F ( x k ) + γ k − 1 d k − 1 T Q d k − 1 = d k − 1 T Q d k ⇒ γ k − 1 d k − 1 T Q d k − 1 = ∇ T F ( x k ) Q d k − 1 ⇒ γ k − 1 = ∇ T F ( x k ) Q d k − 1 d k − 1 T Q d k − 1 \begin{aligned} -\boldsymbol{d}_{k-1}^{T}\boldsymbol{Q}\nabla F\left( \boldsymbol{x}_k \right) +\gamma _{k-1}\boldsymbol{d}_{k-1}^{T}\boldsymbol{Qd}_{k-1}&=\boldsymbol{d}_{k-1}^{T}\boldsymbol{Qd}_k\\\Rightarrow \,\, \gamma _{k-1}\boldsymbol{d}_{k-1}^{T}\boldsymbol{Qd}_{k-1}&=\nabla ^TF\left( \boldsymbol{x}_k \right) \boldsymbol{Qd}_{k-1}\\\Rightarrow \,\, \gamma _{k-1}&=\frac{\nabla ^TF\left( \boldsymbol{x}_k \right) \boldsymbol{Qd}_{k-1}}{\boldsymbol{d}_{k-1}^{T}\boldsymbol{Qd}_{k-1}}\end{aligned} −dk−1TQ∇F(xk)+γk−1dk−1TQdk−1⇒γk−1dk−1TQdk−1⇒γk−1=dk−1TQdk=∇TF(xk)Qdk−1=dk−1TQdk−1∇TF(xk)Qdk−1
其中 ∇ F ( x k ) = Q x k + q \nabla F\left( \boldsymbol{x}_k \right) =\boldsymbol{Qx}_k+\boldsymbol{q} ∇F(xk)=Qxk+q。考虑到 ∇ F ( x k + 1 ) − ∇ F ( x k ) = Q ( x k + 1 − x k ) = α k Q d k \nabla F\left( \boldsymbol{x}_{k+1} \right) -\nabla F\left( \boldsymbol{x}_k \right) =\boldsymbol{Q}\left( \boldsymbol{x}_{k+1}-\boldsymbol{x}_k \right) =\alpha _k\boldsymbol{Qd}_k ∇F(xk+1)−∇F(xk)=Q(xk+1−xk)=αkQdk,则
γ k − 1 = ∇ T F ( x k ) ( ∇ F ( x k ) − ∇ F ( x k − 1 ) ) d k − 1 T ( ∇ F ( x k ) − ∇ F ( x k − 1 ) ) \begin{aligned}\gamma _{k-1}&=\frac{\nabla ^TF\left( \boldsymbol{x}_k \right) \left( \nabla F\left( \boldsymbol{x}_k \right) -\nabla F\left( \boldsymbol{x}_{k-1} \right) \right)}{\boldsymbol{d}_{k-1}^{T}\left( \nabla F\left( \boldsymbol{x}_k \right) -\nabla F\left( \boldsymbol{x}_{k-1} \right) \right)}\end{aligned} γk−1=dk−1T(∇F(xk)−∇F(xk−1))∇TF(xk)(∇F(xk)−∇F(xk−1))
根据
d k = − ∇ F ( x k ) + γ k − 1 d k − 1 ⇒ d k T ∇ F ( x k + 1 ) = − ∇ T F ( x k ) ∇ F ( x k + 1 ) + γ k − 1 d k − 1 T ∇ F ( x k + 1 ) \begin{aligned}\boldsymbol{d}_k&=-\nabla F\left( \boldsymbol{x}_k \right) +\gamma _{k-1}\boldsymbol{d}_{k-1}\\\Rightarrow \boldsymbol{d}_{k}^{T}\nabla F\left( \boldsymbol{x}_{k+1} \right) &=-\nabla ^TF\left( \boldsymbol{x}_k \right) \nabla F\left( \boldsymbol{x}_{k+1} \right) +\gamma _{k-1}\boldsymbol{d}_{k-1}^{T}\nabla F\left( \boldsymbol{x}_{k+1} \right)\end{aligned} dk⇒dkT∇F(xk+1)=−∇F(xk)+γk−1dk−1=−∇TF(xk)∇F(xk+1)+γk−1dk−1T∇F(xk+1)
由子空间扩展定理知 d k T ∇ F ( x k + 1 ) = d k − 1 T ∇ F ( x k + 1 ) = 0 \boldsymbol{d}_{k}^{T}\nabla F\left( \boldsymbol{x}_{k+1} \right) =\boldsymbol{d}_{k-1}^{T}\nabla F\left( \boldsymbol{x}_{k+1} \right) =0 dkT∇F(xk+1)=dk−1T∇F(xk+1)=0,从而 ∇ T F ( x k ) ∇ F ( x k − 1 ) = 0 \nabla ^TF\left( \boldsymbol{x}_k \right) \nabla F\left( \boldsymbol{x}_{k-1} \right) =0 ∇TF(xk)∇F(xk−1)=0,简化
γ k − 1 = − ∇ T F ( x k ) ∇ F ( x k ) d k − 1 T ∇ F ( x k − 1 ) \gamma _{k-1}=\frac{-\nabla ^TF\left( \boldsymbol{x}_k \right) \nabla F\left( \boldsymbol{x}_k \right)}{\boldsymbol{d}_{k-1}^{T}\nabla F\left( \boldsymbol{x}_{k-1} \right)} γk−1=dk−1T∇F(xk−1)−∇TF(xk)∇F(xk)
因为
d k − 1 T ∇ F ( x k − 1 ) = ( − ∇ F ( x k − 1 ) + γ k − 2 d k − 2 ) T ∇ F ( x k − 1 ) = − ∇ T F ( x k − 1 ) ∇ F ( x k − 1 ) \boldsymbol{d}_{k-1}^{T}\nabla F\left( \boldsymbol{x}_{k-1} \right) =\left( -\nabla F\left( \boldsymbol{x}_{k-1} \right) +\gamma _{k-2}\boldsymbol{d}_{k-2} \right) ^T\nabla F\left( \boldsymbol{x}_{k-1} \right) =-\nabla ^TF\left( \boldsymbol{x}_{k-1} \right) \nabla F\left( \boldsymbol{x}_{k-1} \right) dk−1T∇F(xk−1)=(−∇F(xk−1)+γk−2dk−2)T∇F(xk−1)=−∇TF(xk−1)∇F(xk−1)
故
γ k − 1 = ∇ T F ( x k ) ∇ F ( x k ) ∇ T F ( x k − 1 ) ∇ F ( x k − 1 ) {\gamma _{k-1}=\frac{\nabla ^TF\left( \boldsymbol{x}_k \right) \nabla F\left( \boldsymbol{x}_k \right)}{\nabla ^TF\left( \boldsymbol{x}_{k-1} \right) \nabla F\left( \boldsymbol{x}_{k-1} \right)}} γk−1=∇TF(xk−1)∇F(xk−1)∇TF(xk)∇F(xk)
-
步长 α \alpha α
因为
F ( x k + 1 ) = 1 2 x k + 1 T Q x k + 1 + q T x k + 1 = 1 2 ( x k + λ k d k ) T Q ( x k + λ k d k ) + q T ( x k + λ k d k ) F\left( \boldsymbol{x}_{k+1} \right) =\frac{1}{2}\boldsymbol{x}_{k+1}^{T}\boldsymbol{Qx}_{k+1}+\boldsymbol{q}^T\boldsymbol{x}_{k+1}\\=\frac{1}{2}\left( \boldsymbol{x}_k+\lambda _k\boldsymbol{d}_k \right) ^T\boldsymbol{Q}\left( \boldsymbol{x}_k+\lambda _k\boldsymbol{d}_k \right) +\boldsymbol{q}^T\left( \boldsymbol{x}_k+\lambda _k\boldsymbol{d}_k \right) F(xk+1)=21xk+1TQxk+1+qTxk+1=21(xk+λkdk)TQ(xk+λkdk)+qT(xk+λkdk)
所以令 ∂ F ( x k + 1 ) / ∂ α k = 0 {{\partial F\left( \boldsymbol{x}_{k+1} \right)}/{\partial \alpha _k}}=0 ∂F(xk+1)/∂αk=0,立即得到
α k = − d k T ∇ F ( x k ) d k T Q d k {\alpha _k=\frac{-\boldsymbol{d}_{k}^{T}\nabla F\left( \boldsymbol{x}_k \right)}{\boldsymbol{d}_{k}^{T}\boldsymbol{Qd}_k}} αk=dkTQdk−dkT∇F(xk)
4 案例分析与Python实现
二次型是常用的数值优化形式,给定
f ( x ) = 1 2 x T Q x + q T x f\left( \boldsymbol{x} \right) =\frac{1}{2}\boldsymbol{x}^T\boldsymbol{Qx}+\boldsymbol{q}^T\boldsymbol{x} f(x)=21xTQx+qTx
其中
Q = [ 10 − 9 − 9 10 ] , q = [ 4 − 15 ] \boldsymbol{Q}=\left[ \begin{matrix} 10& -9\\ -9& 10\\ \end{matrix} \right] , \boldsymbol{q}=\left[ \begin{array}{c} 4\\ -15\\ \end{array} \right] Q=[10−9−910],q=[4−15]
下面是梯度下降算法的实现
def run(self, func, g_func, x0: np.ndarray) -> np.ndarray:
x, x_next = x0, None
traj = []
iteration = 0
start_time = time.time()
while iteration < self.max_iters:
iteration += 1
traj.append(x)
gx = g_func(x)
if np.sqrt(np.sum(gx ** 2)) < self.eps:
break
alpha = self.alpha if self.alpha else self.line_searcher_.search(func, g_func, x, -gx)
x_next = x - alpha * gx
if math.fabs(func(x) - func(x_next)) < self.eps:
break
x = x_next
end_time = time.time()
return x, traj
下面是共轭梯度算法的实现
def run(self, Q: np.ndarray, q: np.ndarray, x0: np.ndarray) -> np.ndarray:
"""
0.5 * x^T Q x + q^T * x
"""
def func(x: np.ndarray) -> np.ndarray:
return 0.5 * x.T @ Q @ x + q.T @ x
def g_func(x: np.ndarray) -> np.ndarray:
return Q @ x + q
x, x_next = x0, None
d, d_next = g_func(x0), None
traj = []
iteration = 0
start_time = time.time()
while iteration < self.max_iters:
iteration += 1
traj.append(x)
gx = g_func(x)
if np.sqrt(np.sum(gx ** 2)) < self.eps:
break
alpha = -d @ gx / (d.T @ Q @ d)
x_next = x + alpha * d
gx_next = g_func(x_next)
d_next = -gx_next + gx_next.T @ gx_next / (gx.T @ gx) * d
d = d_next
x = x_next
end_time = time.time()
return x, traj
结果如图所示,从共轭梯度法与梯度下降法的迭代轨迹,可以看出共轭梯度法的轨迹更平滑,收敛速度更快(这个案例上大约2000倍的增速)
完整工程代码请联系下方博主名片获取
🔥 更多精彩专栏:
- 《ROS从入门到精通》
- 《Pytorch深度学习实战》
- 《机器学习强基计划》
- 《运动规划实战精讲》
- …