当前位置: 首页 > article >正文

ilqr算法原理推导及代码实践

目录

    • 一. ilqr原理推导
      • 1.1 ilqr问题描述
      • 1.2 ilqr算法原理
      • 1.3 ilqr算法迭代过程
    • 二. ilqr实践代码

一. ilqr原理推导

1.1 ilqr问题描述

本文参考知乎博主: LQR与iLQR:从理论到实践【详细】

基础LQR只能处理线性系统 (指可以使用 x ( k + 1 ) = A x ( k ) + B u ( k ) x(k+1) = Ax(k) + Bu(k) x(k+1)=Ax(k)+Bu(k)来描述系统 ) ,对目标代价函数也做了要求,而在实际问题中非线性系统占了绝大部分,iLQR(Iterative Linear Quadratic Regulator,迭代线性二次型调节器)则能够通过迭代的方式有效地求解非线性最优控制问题,考虑如下问题表述形式:
min ⁡ u 0 , … , u N − 1 J = l f ( x N ) + ∑ k = 0 N − 1 l ( x k , u k ) s.t. x k + 1 = f ( x k , u k ) , x 0  given \begin{gather} \min_{u_0, \ldots, u_{N-1}} & \quad J = l_f(x_N) + \sum_{k=0}^{N-1} l(x_k, u_k) \\ \text{s.t.} & \quad x_{k+1} = f(x_k, u_k), \quad x_0 \text{ given} \end{gather} u0,,uN1mins.t.J=lf(xN)+k=0N1l(xk,uk)xk+1=f(xk,uk),x0 given
其中 J J J是目标函数,需要在控制变量 u 0 , . . . , u N − 1 u_0,...,u_{N-1} u0,...,uN1下进行最小化。目标函数由两部分组成:终端状态 x N x_N xN的代价 l f ( x N ) l_f(x_N) lf(xN) 和 从 k = 0 k = 0 k=0 N − 1 N - 1 N1 的每一个时间步的代价 l ( x k , u k ) l(x_k,u_k) l(xk,uk)的总和,约束条件是状态转移方程 x k + 1 = f ( x k , u k ) x_{k+1} = f(x_k,u_k) xk+1=f(xk,uk),并且初始状态 x 0 x_0 x0是给定的。

1.2 ilqr算法原理

与LQR问题求解过程类似,定义问题的最优状态值函数 J ^ i ( x i ) \hat{J}_{i}(x_{i}) J^i(xi)和动作值函数 J ~ i ( x i , u i ) \tilde{J}_{i}(x_{i}, u_{i}) J~i(xi,ui)
J ^ i ( x i ) = min ⁡ u i { l ( x i , u i ) + J ^ i + 1 ( x i + 1 ) } = min ⁡ u i { l ( x i , u i ) + J ^ i + 1 ( f ( x i , u i ) ) } = min ⁡ u i J ~ i ( x i , u i ) \begin{align*} \hat{J}_{i}(x_{i}) &= \min_{u_{i}}\left\{ l(x_{i},u_{i}) + \hat{J}_{i+1}(x_{i+1}) \right\} \\ &= \min_{u_{i}}\left\{ l(x_{i},u_{i}) + \hat{J}_{i+1}(f(x_{i},u_{i})) \right\} \\ &= \min_{u_{i}} \tilde{J}_{i}\left( x_{i},u_{i} \right) \tag{3} \end{align*} J^i(xi)=uimin{l(xi,ui)+J^i+1(xi+1)}=uimin{l(xi,ui)+J^i+1(f(xi,ui))}=uiminJ~i(xi,ui)(3)
下面详细解释(3)是怎么来的:

  1. 最优状态值函数 J ^ i ( x i ) \hat{J}_{i}(x_{i}) J^i(xi) 定义为给定当前状态 x i x_i xi,从时间步 i i i N N N 的最小总代价,举个例子,现在状态是 x N − 1 x_{N - 1} xN1,那么可以预期的最优状态值 J ^ N − 1 ( x N − 1 ) \hat{J}_{N - 1}(x_{N - 1}) J^N1(xN1) 为当前 N − 1 N-1 N1 时刻的最小即时代价 l ( x N − 1 , u N − 1 ) l(x_{N-1},u_{N-1}) l(xN1,uN1) 加上最小终端代价 J ^ N ( x N ) \hat{J}_{N}(x_{N}) J^N(xN),就像一个人开车从A市去B市,计算最优燃油消耗量Y,那么他最省油的开法肯定是当前时刻采用最省油的开法,并在到达B市之前都一直采用最省油的开法。
  2. 动作值函数 J ~ i ( x i , u i ) \tilde{J}_{i}\left( x_{i},u_{i} \right) J~i(xi,ui) 定义为在当前状态 x i x_i xi下采取某个控制输入 u i u_i ui并遵循最优策略直到终端状态的总代价,动作值函数为最优需要满足采取的控制输入 u i u_i ui为最优
  3. 状态转移:状态 x i + 1 x_{i + 1} xi+1 由当前状态 x i x_{i} xi u i u_i ui通过状态转移函数 f ( x i , u i ) f(x_{i},u_{i}) f(xi,ui)决定,因此,可以将当前状态 x i x_i xi的最优状态值函数 J ^ i ( x i ) \hat{J}_{i}(x_{i}) J^i(xi) 表达为:
    J ^ i ( x i ) = min ⁡ u i { l ( x i , u i ) + J ^ i + 1 ( x i + 1 ) } = min ⁡ u i { l ( x i , u i ) + J ^ i + 1 ( f ( x i , u i ) ) } = min ⁡ u i J ~ i ( x i , u i ) \begin{align*} \hat{J}_{i}(x_{i}) &= \min_{u_{i}} \left\{ l(x_{i}, u_{i}) + \hat{J}_{i+1}(x_{i+1}) \right\} \\ &= \min_{u_{i}} \left\{ l(x_{i}, u_{i}) + \hat{J}_{i+1}(f(x_{i}, u_{i})) \right\} \\ &= \min_{u_{i}} \tilde{J}_{i}(x_{i}, u_{i}) \tag{4} \end{align*} J^i(xi)=uimin{l(xi,ui)+J^i+1(xi+1)}=uimin{l(xi,ui)+J^i+1(f(xi,ui))}=uiminJ~i(xi,ui)(4)
  4. 动作值函数 J ~ i ( x i , u i ) \tilde{J}_{i}\left( x_{i},u_{i} \right) J~i(xi,ui) 可以定义为:
    J ~ i ( x i , u i ) = l ( x i , u i ) + J ^ i + 1 ( f ( x i , u i ) ) \begin{align*} \tilde{J}_{i}\left( x_{i},u_{i} \right) = l(x_{i},u_{i}) + \hat{J}_{i+1}(f(x_{i},u_{i})) \tag{5} \end{align*} J~i(xi,ui)=l(xi,ui)+J^i+1(f(xi,ui))(5)
  5. 最小化动作值函数:最优状态值函数也可以通过最小化所有可能的动作值函数得到:
    J ^ i ( x i ) = min ⁡ u i J ~ i ( x i , u i ) \begin{align*} \hat{J}_{i}(x_{i}) = \min_{u_{i}} \tilde{J}_{i}(x_{i}, u_{i}) \tag{6} \end{align*} J^i(xi)=uiminJ~i(xi,ui)(6)
    坐稳了,下面开始推导,终端最优状态值函数 J ^ N ( x N ) = l f ( x N ) \hat{J}_{N}(x_{N})={l}_{f}(x_{N}) J^N(xN)=lf(xN) ,很好理解,因为到终端状态就没有输入了,所以只有状态代价,没有控制代价。
    使用泰勒展开将系统在当前状态 x i x_{i} xi附近线性化,进而转换为LQR问题进行求解:
    状态转移函数展开,这里需要了解下非线性系统是如何线性化的,链接: 雅可比矩阵几何意义的直观解释及应用
    通过雅可比矩阵得到 A i A_{i} Ai B i B_{i} Bi
    f ( x i , u i ) + δ f ( x i , u i ) = f ( x i + δ x i , u i + δ u i ) ≈ f ( x i , u i ) + ∂ f ( x i , u i ) ∂ x i δ x i + ∂ f ( x i , u i ) ∂ u i δ u i ≜ f ( x i , u i ) + A i δ x i + B i δ u i \begin{align*} f\left(x_{i}, u_{i}\right)+\delta f\left(x_{i}, u_{i}\right) &= f\left(x_{i}+\delta x_{i}, u_{i}+\delta u_{i}\right) \\ &\approx f\left(x_{i}, u_{i}\right)+\frac{\partial f\left(x_{i}, u_{i}\right)}{\partial x_{i}}\delta x_{i}+\frac{\partial f\left(x_{i}, u_{i}\right)}{\partial u_{i}}\delta u_{i} \\ &\triangleq f\left(x_{i}, u_{i}\right)+A_{i}\delta x_{i}+B_{i}\delta u_{i}\tag{7} \end{align*} f(xi,ui)+δf(xi,ui)=f(xi+δxi,ui+δui)f(xi,ui)+xif(xi,ui)δxi+uif(xi,ui)δuif(xi,ui)+Aiδxi+Biδui(7)
    状态值函数展开,这里也很好理解,注意这个 p i T p_{i}^T piT P i P_{i} Pi,后面backward pass的时候通过更新它们俩来实现状态递推
    J ^ i ( x i ) + δ J ^ i ( x i ) = J ^ i ( x i + δ x i ) ≈ J ^ i ( x i ) + ∂ J ^ i ( x i ) ∂ x i δ x i + 1 2 δ x i T ∂ 2 J ^ i ( x i ) ∂ x i 2 δ x i ≜ J ^ i ( x i ) + p i T δ x i + 1 2 δ x i T P i δ x i \begin{align*} \hat{J}_{i}(x_{i}) + \delta \hat{J}_{i}(x_{i}) &= \hat{J}_{i}(x_{i} + \delta x_{i}) \\ &\approx \hat{J}_{i}(x_{i}) + \frac{\partial \hat{J}_{i}(x_{i})}{\partial x_{i}} \delta x_{i} + \frac{1}{2} \delta x_{i}^T \frac{\partial^2 \hat{J}_{i}(x_{i})}{\partial x_{i}^2} \delta x_{i} \\ &\triangleq \hat{J}_{i}(x_{i}) + p_{i}^T \delta x_{i} + \frac{1}{2} \delta x_{i}^T P_{i} \delta x_{i}\tag{8} \end{align*} J^i(xi)+δJ^i(xi)=J^i(xi+δxi)J^i(xi)+xiJ^i(xi)δxi+21δxiTxi22J^i(xi)δxiJ^i(xi)+piTδxi+21δxiTPiδxi(8)
    动作值函数展开,常规的泰勒一阶展开和二阶展开,没有什么难理解的
    J ~ i ( x i , u i ) + δ J ~ i ( x i , u i ) = J ~ i ( x i + δ x i , u i + δ u i ) ≈ J ~ i ( x i , u i ) + ∂ J ~ i ( x i , u i ) ∂ x i δ x i + ∂ J ~ i ( x i , u i ) ∂ u i δ u i + 1 2 δ x i T ∂ 2 J ~ i ( x i , u i ) ∂ x i 2 δ x i + 1 2 δ u i T ∂ 2 J ~ i ( x i , u i ) ∂ u i 2 δ u i + 1 2 δ x i T ∂ 2 J ~ i ( x i , u i ) ∂ x i ∂ u i δ u i + 1 2 δ u i T ∂ 2 J ~ i ( x i , u i ) ∂ u i ∂ x i δ x i ≜ J ~ i ( x i , u i ) + Q x i T δ x i + Q u i T δ u i + 1 2 δ x i T Q x i 2 δ x i + 1 2 δ u i T Q u i 2 δ u i + 1 2 δ x i T Q x i u i δ u i + 1 2 δ u i T Q u i x i T δ x i ( 9 ) \begin{align*} \tilde{J}_{i}\left(x_{i},u_{i}\right)+\delta\tilde{J}_{i}\left(x_{i},u_{i}\right) &= \tilde{J}_{i}\left(x_{i}+\delta x_{i},u_{i}+\delta u_{i}\right) \\ &\approx \tilde{J}_{i}\left(x_{i},u_{i}\right)+\frac{\partial\tilde{J}_{i}\left(x_{i},u_{i}\right)}{\partial x_{i}}\delta x_{i}+\frac{\partial\tilde{J}_{i}\left(x_{i},u_{i}\right)}{\partial u_{i}}\delta u_{i} \\ &\quad+\frac{1}{2}\delta x_{i}^{T}\frac{\partial^{2}\tilde{J}_{i}\left(x_{i},u_{i}\right)}{\partial x_{i}^{2}}\delta x_{i}+\frac{1}{2}\delta u_{i}^{T}\frac{\partial^{2}\tilde{J}_{i}\left(x_{i},u_{i}\right)}{\partial u_{i}^{2}}\delta u_{i} \\ &\quad+\frac{1}{2}\delta x_{i}^{T}\frac{\partial^{2}\tilde{J}_{i}\left(x_{i},u_{i}\right)}{\partial x_{i}\partial u_{i}}\delta u_{i}+\frac{1}{2}\delta u_{i}^{T}\frac{\partial^{2}\tilde{J}_{i}\left(x_{i},u_{i}\right)}{\partial u_{i}\partial x_{i}}\delta x_{i} \\ &\triangleq \tilde{J}_{i}\left(x_{i},u_{i}\right)+Q_{x_{i}}^{T}\delta x_{i}+Q_{u_{i}}^{T}\delta u_{i}+\frac{1}{2}\delta x_{i}^{T}Q_{x_{i}^{2}}\delta x_{i} \\ &\quad+\frac{1}{2}\delta u_{i}^{T}Q_{u_{i}^{2}}\delta u_{i}+\frac{1}{2}\delta x_{i}^{T}Q_{x_{i} u_{i}}\delta u_{i} + \frac{1}{2}\delta u_{i}^{T}Q_{u_{i} x_{i}}^{T}\delta x_{i} \end{align*}\qquad(9) J~i(xi,ui)+δJ~i(xi,ui)=J~i(xi+δxi,ui+δui)J~i(xi,ui)+xiJ~i(xi,ui)δxi+uiJ~i(xi,ui)δui+21δxiTxi22J~i(xi,ui)δxi+21δuiTui22J~i(xi,ui)δui+21δxiTxiui2J~i(xi,ui)δui+21δuiTuixi2J~i(xi,ui)δxiJ~i(xi,ui)+QxiTδxi+QuiTδui+21δxiTQxi2δxi+21δuiTQui2δui+21δxiTQxiuiδui+21δuiTQuixiTδxi(9)
    根据公式(5),对 J ~ ( x , u ) \tilde{J}\left( x,u \right) J~(x,u)做泰勒展开有:
    J ~ i ( x i , u i ) + δ J ~ i ( x i , u i ) = J ~ i ( x i + δ x i , u i + δ u i ) = l ( x i + δ x i , u i + δ u i ) + J ^ i + 1 ( f ( x i + δ x i , u i + δ u i ) ) ( 10 ) \begin{align*} \tilde{J}_{i}(x_{i}, u_{i}) + \delta \tilde{J}_{i}(x_{i}, u_{i}) &= \tilde{J}_{i}(x_{i} + \delta x_{i}, u_{i} + \delta u_{i}) \\ &= l(x_{i} + \delta x_{i}, u_{i} + \delta u_{i}) + \hat{J}_{i+1}(f(x_{i} + \delta x_{i}, u_{i} + \delta u_{i})) \end{align*}\qquad(10) J~i(xi,ui)+δJ~i(xi,ui)=J~i(xi+δxi,ui+δui)=l(xi+δxi,ui+δui)+J^i+1(f(xi+δxi,ui+δui))(10)
    其中,即时代价展开为:
    l ( x i + δ x i , u i + δ u i ) = l ( x i , u i ) + δ l ( x i , u i ) ≈ l ( x i , u i ) + ∂ l ( x i , u i ) ∂ x i δ x i + ∂ l ( x i , u i ) ∂ u i δ u i + 1 2 δ x i T ∂ 2 l ( x i , u i ) ∂ x i 2 δ x i + 1 2 δ u i T ∂ 2 l ( x i , u i ) ∂ u i 2 δ u i + 1 2 δ x i T ∂ 2 l ( x i , u i ) ∂ x i ∂ u i δ u i + 1 2 δ u i T ∂ 2 l ( x i , u i ) ∂ u i ∂ x i δ x i ≜ l ( x i , u i ) + l x i δ x i + l u i δ u i + 1 2 δ x i T l x i 2 δ x i + 1 2 δ u i T l u i 2 δ u i + 1 2 δ x i T l x i u i δ u i + 1 2 δ u i T l u i x i δ x i ( 11 ) \begin{align*} l(x_{i}+\delta x_{i},u_{i}+\delta u_{i}) &= l(x_{i},u_{i})+\delta l(x_{i},u_{i}) \\ &\approx l(x_{i},u_{i})+\frac{\partial l(x_{i},u_{i})}{\partial x_{i}}\delta x_{i}+\frac{\partial l(x_{i},u_{i})}{\partial u_{i}}\delta u_{i} \\ &\quad+\frac{1}{2}\delta x_{i}^{T}\frac{\partial^{2} l(x_{i},u_{i})}{\partial x_{i}^{2}}\delta x_{i}+\frac{1}{2}\delta u_{i}^{T}\frac{\partial^{2} l(x_{i},u_{i})}{\partial u_{i}^{2}}\delta u_{i} \\ &\quad+\frac{1}{2}\delta x_{i}^{T}\frac{\partial^{2} l(x_{i},u_{i})}{\partial x_{i}\partial u_{i}}\delta u_{i}+\frac{1}{2}\delta u_{i}^{T}\frac{\partial^{2} l(x_{i},u_{i})}{\partial u_{i}\partial x_{i}}\delta x_{i} \\ &\triangleq l(x_{i},u_{i})+l_{x_{i}}\delta x_{i}+l_{u_{i}}\delta u_{i}+\frac{1}{2}\delta x_{i}^{T} l_{x_{i}^{2}}\delta x_{i} \\ &\quad+\frac{1}{2}\delta u_{i}^{T} l_{u_{i}^{2}}\delta u_{i}+\frac{1}{2}\delta x_{i}^{T} l_{x_{i} u_{i}}\delta u_{i}+\frac{1}{2}\delta u_{i}^{T} l_{u_{i} x_{i}}\delta x_{i} \end{align*}\qquad(11) l(xi+δxi,ui+δui)=l(xi,ui)+δl(xi,ui)l(xi,ui)+xil(xi,ui)δxi+uil(xi,ui)δui+21δxiTxi22l(xi,ui)δxi+21δuiTui22l(xi,ui)δui+21δxiTxiui2l(xi,ui)δui+21δuiTuixi2l(xi,ui)δxil(xi,ui)+lxiδxi+luiδui+21δxiTlxi2δxi+21δuiTlui2δui+21δxiTlxiuiδui+21δuiTluixiδxi(11)
    下一时刻的状态值函数展开为(这里参考式(8)):
    J ^ i + 1 ( f ( x i + δ x i , u i + δ u i ) ) = J ^ i + 1 ( f ( x i , u i ) + δ f ( x i , u i ) ) ≈ J ^ i + 1 ( f ( x i , u i ) ) + ∂ J ^ i + 1 ( x i + 1 ) ∂ x i + 1 δ f ( x i , u i ) + 1 2 δ f ( x i , u i ) T ∂ 2 J ^ i + 1 ( x i + 1 ) ∂ x i + 1 2 δ f ( x i , u i ) = J ^ i + 1 ( f ( x i , u i ) ) + p i + 1 T ( A i δ x i + B i δ u i ) + 1 2 ( A i δ x i + B i δ u i ) T P i + 1 ( A i δ x i + B i δ u i ) = J ^ i + 1 ( f ( x i , u i ) ) + p i + 1 T A i δ x i + p i + 1 T B i δ u i + 1 2 δ x i T A i T P i + 1 A i δ x i + 1 2 δ x i T A i T P i + 1 B i δ u i + 1 2 δ u i T B i T P i + 1 A i δ x i + 1 2 δ u i T B i T P i + 1 B i δ u i ( 12 ) \begin{align*} \hat{J}_{i+1}\left(f\left(x_{i}+\delta x_{i}, u_{i}+\delta u_{i}\right)\right) &= \hat{J}_{i+1}\left(f\left(x_{i}, u_{i}\right)+\delta f\left(x_{i}, u_{i}\right)\right) \\ &\approx \hat{J}_{i+1}\left(f\left(x_{i}, u_{i}\right)\right)+\frac{\partial\hat{J}_{i+1}\left(x_{i+1}\right)}{\partial x_{i+1}}\delta f\left(x_{i}, u_{i}\right) \\ &\quad+\frac{1}{2}\delta f\left(x_{i}, u_{i}\right)^{T}\frac{\partial^{2}\hat{J}_{i+1}\left(x_{i+1}\right)}{\partial x_{i+1}^{2}}\delta f\left(x_{i}, u_{i}\right) \\ &= \hat{J}_{i+1}\left(f\left(x_{i}, u_{i}\right)\right)+p_{i+1}^{T}\left(A_{i}\delta x_{i}+B_{i}\delta u_{i}\right) \\ &\quad+\frac{1}{2}\left(A_{i}\delta x_{i}+B_{i}\delta u_{i}\right)^{T} P_{i+1}\left(A_{i}\delta x_{i}+B_{i}\delta u_{i}\right) \\ &= \hat{J}_{i+1}\left(f\left(x_{i}, u_{i}\right)\right)+p_{i+1}^{T} A_{i}\delta x_{i}+p_{i+1}^{T} B_{i}\delta u_{i} \\ &\quad+\frac{1}{2}\delta x_{i}^{T} A_{i}^{T} P_{i+1} A_{i}\delta x_{i}+\frac{1}{2}\delta x_{i}^{T} A_{i}^{T} P_{i+1} B_{i}\delta u_{i} \\ &\quad+\frac{1}{2}\delta u_{i}^{T} B_{i}^{T} P_{i+1} A_{i}\delta x_{i}+\frac{1}{2}\delta u_{i}^{T} B_{i}^{T} P_{i+1} B_{i}\delta u_{i} \end{align*}\qquad(12) J^i+1(f(xi+δxi,ui+δui))=J^i+1(f(xi,ui)+δf(xi,ui))J^i+1(f(xi,ui))+xi+1J^i+1(xi+1)δf(xi,ui)+21δf(xi,ui)Txi+122J^i+1(xi+1)δf(xi,ui)=J^i+1(f(xi,ui))+pi+1T(Aiδxi+Biδui)+21(Aiδxi+Biδui)TPi+1(Aiδxi+Biδui)=J^i+1(f(xi,ui))+pi+1TAiδxi+pi+1TBiδui+21δxiTAiTPi+1Aiδxi+21δxiTAiTPi+1Biδui+21δuiTBiTPi+1Aiδxi+21δuiTBiTPi+1Biδui(12)
    其中, p i = ∂ J ^ i ( x i ) ∂ x i p_{i} = \frac{\partial \hat{J}_{i}(x_{i})}{\partial x_{i}} pi=xiJ^i(xi) P i = ∂ 2 J ^ i ( x i ) ∂ x i 2 P_{i} = \frac{\partial^2 \hat{J}_{i}(x_{i})}{\partial x_{i}^2} Pi=xi22J^i(xi)。 联合(8)(9)(10)(11)(12)可得 J ~ i ( x i , u i ) \tilde{J}_{i}(x_{i}, u_{i}) J~i(xi,ui) ( x i , u i ) (x_{i}, u_{i}) (xi,ui)附近泰勒展开的多项式系数矩阵:
    Q x i = l x i + p i + 1 T A i Q u i = l u i + p i + 1 T B i Q x i 2 = l x i 2 + A i T P i + 1 A i Q x i u i = l x i u i + A i T P i + 1 B i Q u i x i = l u i x i + B i T P i + 1 A i Q u i 2 = l u i 2 + B i T P i + 1 B i \begin{align*} Q_{x_{i}} &= l_{x_{i}} + p_{i+1}^{T} A_{i} \\ Q_{u_{i}} &= l_{u_{i}} + p_{i+1}^{T} B_{i} \\ Q_{x_{i}^{2}} &= l_{x_{i}^{2}} + A_{i}^{T} P_{i+1} A_{i} \\ Q_{x_{i} u_{i}} &= l_{x_{i} u_{i}} + A_{i}^{T} P_{i+1} B_{i} \\ Q_{u_{i} x_{i}} &= l_{u_{i} x_{i}} + B_{i}^{T} P_{i+1} A_{i} \\ Q_{u_{i}^{2}} &= l_{u_{i}^{2}} + B_{i}^{T} P_{i+1} B_{i} \tag{13} \end{align*} QxiQuiQxi2QxiuiQuixiQui2=lxi+pi+1TAi=lui+pi+1TBi=lxi2+AiTPi+1Ai=lxiui+AiTPi+1Bi=luixi+BiTPi+1Ai=lui2+BiTPi+1Bi(13)
    一般来说有 Q x i u i = Q x i u i T Q_{x_{i} u_{i}}=Q_{x_{i} u_{i}}^{T} Qxiui=QxiuiT,根据式(9),如果当前状态 x i x_{i} xi,输入是 u i u_{i} ui,希望 x i + δ x i x_{i} + \delta x_{i} xi+δxi的代价值函数 J ~ i ( x i + δ x i , u i + δ u i ) \tilde{J}_{i}\left(x_{i}+\delta x_{i},u_{i}+\delta u_{i}\right) J~i(xi+δxi,ui+δui)最小,则需要使 δ J ~ i ( x i , u i ) \delta \tilde{J}_{i}(x_{i}, u_{i}) δJ~i(xi,ui)最小,有:
    δ J ^ i ( x i ) = min ⁡ δ u i δ J ~ i ( x i , u i ) = min ⁡ δ u i { Q x i T δ x i + Q u i T δ u i + 1 2 δ x i T Q x i 2 δ x i + 1 2 δ u i T Q u i 2 δ u i + 1 2 δ x i T Q x i u i δ u i + 1 2 δ u i T Q u i x i T δ x i } = min ⁡ δ u i δ J ~ i ( x i , u i ) ( 14 ) \begin{align*} \delta \hat{J}_i(x_i) &= \min_{\delta u_i} \delta \tilde{J}_i(x_i, u_i) \\ &= \min_{\delta u_i} \left\{ Q_{x_i}^T \delta x_i + Q_{u_i}^T \delta u_i + \frac{1}{2} \delta x_i^T Q_{x_i^2} \delta x_i + \frac{1}{2} \delta u_i^T Q_{u_i^2} \delta u_i \right. \\ &\quad \left. + \frac{1}{2} \delta x_i^T Q_{x_i u_i} \delta u_i + \frac{1}{2} \delta u_i^T Q_{u_i x_i}^T \delta x_i \right\} \\ &= \min_{\delta u_i} \delta \tilde{J}_i(x_i, u_i) \end{align*} \qquad(14) δJ^i(xi)=δuiminδJ~i(xi,ui)=δuimin{QxiTδxi+QuiTδui+21δxiTQxi2δxi+21δuiTQui2δui+21δxiTQxiuiδui+21δuiTQuixiTδxi}=δuiminδJ~i(xi,ui)(14)
    δ J ~ i ( x i , u i ) \delta \tilde{J}_{i}(x_{i}, u_{i}) δJ~i(xi,ui) δ u i \delta u_{i} δui求偏导,使其取得最小值得到 δ J ^ i ( x i ) \delta \hat{J}_i(x_i) δJ^i(xi),并让其等于0有:
    ∂ δ J ~ i ( x i ) ∂ δ u i = Q u i + 1 2 Q u i x i T δ x i + Q u i 2 δ u i + 1 2 Q x i u i δ x i = Q u i + Q u i x i δ x i + Q u i 2 δ u i = 0 ( 15 ) \begin{align*} \frac{\partial \delta \tilde{J}_i(x_i)}{\partial \delta u_i} &= Q_{u_i} + \frac{1}{2} Q_{u_i x_i}^T \delta x_i + Q_{u_i^2} \delta u_i + \frac{1}{2} Q_{x_i u_i} \delta x_i \\ &= Q_{u_i} + Q_{u_i x_i} \delta x_i + Q_{u_i^2} \delta u_i = 0 \end{align*} \qquad(15) δuiδJ~i(xi)=Qui+21QuixiTδxi+Qui2δui+21Qxiuiδxi=Qui+Quixiδxi+Qui2δui=0(15)
    解出来在 i i i时刻的最优控制输入的变化量:
    δ u ^ i ∗ = − Q u i 2 − 1 ( Q u i + Q u i x i δ x i ) ≜ K i δ x i + d i ( 16 ) \begin{align*} \delta \hat{u}_i^* &= -Q_{u_i^2}^{-1} \left( Q_{u_i} + Q_{u_i x_i} \delta x_i \right) \\ &\triangleq K_i \delta x_i + d_i \end{align*} \qquad(16) δu^i=Qui21(Qui+Quixiδxi)Kiδxi+di(16)
    iLQR的核心思想就是在给定初解 u i u_i ui的基础上,求最优的补偿输入 δ u ^ i ∗ \delta \hat{u}_i^* δu^i 使得在 u i + δ u ^ i ∗ u_i+\delta \hat{u}_i^* ui+δu^i的作用下获得更优的解,并循环这个过程。
    K i K_i Ki为反馈增益, d i d_i di为前馈增益。将式(16)代入到式(9)整理得:
    δ J ~ i ( x i , u i ∗ ) = Q x i T δ x i + Q u i T ( K i δ x i + d i ) + 1 2 δ x i T Q x i 2 δ x i + 1 2 ( K i δ x i + d i ) T Q u i 2 ( K i δ x i + d i ) + 1 2 δ x i T Q x i u i ( K i δ x i + d i ) + 1 2 ( K i δ x i + d i ) T Q u i x i δ x i = 1 2 δ x i T [ Q x i 2 + K i T Q u i 2 K i + Q x i u i K i + K i T Q u i x i ] δ x i + [ Q x i + K i T Q u i 2 d i + Q x i u i d i + K i T Q u i d i ] T δ x i + 1 2 d i T Q u i 2 d i + Q u i T d i ( 17 ) \begin{align*} \delta \tilde{J}_i(x_i, u_i^*) &=Q_{x_i}^T \delta x_i + Q_{u_i}^T (K_i \delta x_i + d_i) + \frac{1}{2} \delta x_i^T Q_{x_i^2} \delta x_i \\ &\quad+ \frac{1}{2} (K_i \delta x_i + d_i)^T Q_{u_i^2} (K_i \delta x_i + d_i) \\ &\quad + \frac{1}{2} \delta x_i^T Q_{x_i u_i} (K_i \delta x_i + d_i) + \frac{1}{2} (K_i \delta x_i + d_i)^T Q_{u_i x_i} \delta x_i \\ &= \frac{1}{2} \delta x_i^T [Q_{x_i^2} + K_i^T Q_{u_i^2} K_i + Q_{x_i u_i} K_i + K_i^T Q_{u_i x_i}] \delta x_i \\ &\quad + [Q_{x_i} + K_i^T Q_{u_i^2} d_i + Q_{x_i u_i} d_i + K_i^T Q_{u_i} d_i]^T \delta x_i \\ &\quad + \frac{1}{2} d_i^T Q_{u_i^2} d_i + Q_{u_i}^T d_i \end{align*} \qquad(17) δJ~i(xi,ui)=QxiTδxi+QuiT(Kiδxi+di)+21δxiTQxi2δxi+21(Kiδxi+di)TQui2(Kiδxi+di)+21δxiTQxiui(Kiδxi+di)+21(Kiδxi+di)TQuixiδxi=21δxiT[Qxi2+KiTQui2Ki+QxiuiKi+KiTQuixi]δxi+[Qxi+KiTQui2di+Qxiuidi+KiTQuidi]Tδxi+21diTQui2di+QuiTdi(17)
    这里需要注意的是 δ J ~ i ( x i , u i ∗ ) = δ J ^ i ( x i ) \delta \tilde{J}_i(x_i, u_i^*)=\delta \hat{J}_i(x_i) δJ~i(xi,ui)=δJ^i(xi),结合式(8)式(17)可以得到 p i p_{i} pi P i P_{i} Pi的递推关系式:
    p i = Q x i + K i T Q u i 2 d i + Q x i u i d i + K i T Q u P i = Q x i 2 + K i T Q u i 2 K i + Q x i u i K i + K i T Q u i x i Δ J ^ i = 1 2 d i T Q u i 2 d i + Q u i T d i ( 18 ) \begin{align*} p_i &= Q_{x_i} + K_i^T Q_{u_i^2} d_i + Q_{x_i u_i} d_i + K_i^T Q_u \\ P_i &= Q_{x_i^2} + K_i^T Q_{u_i^2} K_i + Q_{x_i u_i} K_i + K_i^T Q_{u_i x_i} \\ \Delta \hat{J}_i &= \frac{1}{2} d_i^T Q_{u_i^2} d_i + Q_{u_i}^T d_i \end{align*} \qquad(18) piPiΔJ^i=Qxi+KiTQui2di+Qxiuidi+KiTQu=Qxi2+KiTQui2Ki+QxiuiKi+KiTQuixi=21diTQui2di+QuiTdi(18)
    其中, p N T = ∂ l f ( x N ) ∂ x N p_{N}^T = \frac{\partial l_f(x_{N})}{\partial x_{N}} pNT=xNlf(xN) P N = ∂ 2 l f ( x N ) ∂ x N 2 P_{N} = \frac{\partial^2 l_f(x_{N})}{\partial x_{N}^2} PN=xN22lf(xN) Δ J ^ i \Delta \hat{J}_i ΔJ^i J ^ i ( x i ) \hat{J}_i(x_i) J^i(xi)二阶泰勒展开的高阶无穷小项,与 δ x \delta x δx无关,不影响 δ u \delta u δu的计算。
    从上面的推导可以看出,如果确定 p N T p_{N}^T pNT P N P_{N} PN,就可以通过式(18)来递推得到 p N − 1 T p_{N-1}^T pN1T P N − 1 P_{N-1} PN1之所以有backward pass,就是为了从 N − 1 N-1 N1时刻,一直倒推到当前时刻,求出每个时刻的最优控制输入的变化量。

1.3 ilqr算法迭代过程

  1. Backward pass
    初始化 p N T = ∂ l f ( x N ) ∂ x N p_{N}^T = \frac{\partial l_f(x_{N})}{\partial x_{N}} pNT=xNlf(xN) P N = ∂ 2 l f ( x N ) ∂ x N 2 P_{N} = \frac{\partial^2 l_f(x_{N})}{\partial x_{N}^2} PN=xN22lf(xN)
    f o r for for i = N − 1 , . . . , 0 i = N-1,...,0 i=N1,...,0
    根据式(13)式(18)更新 p i ← p i + 1 p_{i} \leftarrow p_{i+1} pipi+1 P i ← P i + 1 P_{i} \leftarrow P_{i+1} PiPi+1
    根据式(16)计算 K i = Q u i 2 − 1 Q u i x i K_i = Q_{u_i^2}^{-1} Q_{u_i x_i} Ki=Qui21Quixi d i = Q u i 2 − 1 Q u i d_i = Q_{u_i^2}^{-1} Q_{u_i} di=Qui21Qui
  2. Forward pass
    初始条件 x 0 x_{0} x0,对于任意迭代次数 j j j x 0 j = x 0 j − 1 x_{0}^j = x_{0}^{j-1} x0j=x0j1,这里也好理解,每次迭代都是基于当前时刻的状态,所以不会变
    f o r for for i = 0 , . . . , N − 1 i = 0,...,N-1 i=0,...,N1
    δ x i = x i j − x i j − 1 \delta x_i = x_i^j - x_i^{j-1} δxi=xijxij1 δ x i \delta x_i δxi的含义就是当前轮迭代解和上一轮迭代解在 i i i时刻的差值(是不是有牛顿法那味了)
    δ u ^ i ∗ = K i δ x i + d i \delta \hat{u}_i^* = K_i \delta x_i + d_i δu^i=Kiδxi+di,更新控制输入 u i j = u i j − 1 + δ u ^ i ∗ u_{i}^j = u_{i}^{j-1} + \delta\hat{u}_i^* uij=uij1+δu^i
    更新状态 x i + 1 j = f ( x i j , u i j ) x_{i+1}^j = f(x_{i}^{j},u_{i}^{j}) xi+1j=f(xij,uij)
  3. 满足 ∣ J ( j + 1 ) − J ( j ) J ( j ) ∣ < ϵ \left| \frac{J^{(j+1)} - J^{(j)}}{J^{(j)}} \right| < \epsilon J(j)J(j+1)J(j) <ϵ或者迭代达到最大轮次终止迭代。

二. ilqr实践代码

参考这位博主: 最优化理论与自动驾驶(十一):基于iLQR的自动驾驶轨迹跟踪算法(c++和python版本)


http://www.kler.cn/a/443546.html

相关文章:

  • DevOps流程CICD之Jenkins使用操作
  • 【持续集成与持续部署(CI/CD)工具 - Jenkins】详解
  • tar.gz压缩文件在linux上解压异常问题:gzip:stdin:invalid compressed data
  • Spring cloud GateWay入门
  • Fetch处理大模型流式数据请求与解析
  • RK3566和Robo_C的EMC防护设计细节
  • 音视频入门基础:MPEG2-TS专题(19)——FFmpeg源码中,解析TS流中的PES流的实现
  • [网络]tcp通信协议接口函数及一些简易项目
  • 【后端面试总结】深入解析进程和线程的区别
  • 《C++ 赋能强化学习:Q - learning 算法的实现之路》
  • 【Java 数据结构】如何写一副扑克牌 (附:全部码源) !!!
  • 量子通信学习路径(一)
  • JS使用random随机数实现简单的四则算数验证
  • transformer学习笔记-位置编码
  • 使用 NVIDIA DALI 计算视频的光流
  • 【设计模式】单例模式的应用场景
  • 计算机的错误计算(一百八十五)
  • AI-PR曲线
  • C++中如何实现单例模式?
  • 【Maven】基础(一)
  • 深度神经网络(DNN)在时序预测中的应用与缺陷
  • 前端 计算发布时间(如“1小时前”、“3天前”等)
  • Mybatis plus查询开用了Redis缓存,一开始挺快,用户多/时间拉长就变慢了
  • django基于python的图书馆管理系统
  • 重生之我在异世界学智力题(8)
  • Python中exifread库使用