ilqr算法原理推导及代码实践
目录
- 一. ilqr原理推导
- 1.1 ilqr问题描述
- 1.2 ilqr算法原理
- 1.3 ilqr算法迭代过程
- 二. ilqr实践代码
一. ilqr原理推导
1.1 ilqr问题描述
本文参考知乎博主: LQR与iLQR:从理论到实践【详细】
基础LQR只能处理线性系统 (指可以使用
x
(
k
+
1
)
=
A
x
(
k
)
+
B
u
(
k
)
x(k+1) = Ax(k) + Bu(k)
x(k+1)=Ax(k)+Bu(k)来描述系统 ) ,对目标代价函数也做了要求,而在实际问题中非线性系统占了绝大部分,iLQR(Iterative Linear Quadratic Regulator,迭代线性二次型调节器)则能够通过迭代的方式有效地求解非线性最优控制问题,考虑如下问题表述形式:
min
u
0
,
…
,
u
N
−
1
J
=
l
f
(
x
N
)
+
∑
k
=
0
N
−
1
l
(
x
k
,
u
k
)
s.t.
x
k
+
1
=
f
(
x
k
,
u
k
)
,
x
0
given
\begin{gather} \min_{u_0, \ldots, u_{N-1}} & \quad J = l_f(x_N) + \sum_{k=0}^{N-1} l(x_k, u_k) \\ \text{s.t.} & \quad x_{k+1} = f(x_k, u_k), \quad x_0 \text{ given} \end{gather}
u0,…,uN−1mins.t.J=lf(xN)+k=0∑N−1l(xk,uk)xk+1=f(xk,uk),x0 given
其中
J
J
J是目标函数,需要在控制变量
u
0
,
.
.
.
,
u
N
−
1
u_0,...,u_{N-1}
u0,...,uN−1下进行最小化。目标函数由两部分组成:终端状态
x
N
x_N
xN的代价
l
f
(
x
N
)
l_f(x_N)
lf(xN) 和 从
k
=
0
k = 0
k=0 到
N
−
1
N - 1
N−1 的每一个时间步的代价
l
(
x
k
,
u
k
)
l(x_k,u_k)
l(xk,uk)的总和,约束条件是状态转移方程
x
k
+
1
=
f
(
x
k
,
u
k
)
x_{k+1} = f(x_k,u_k)
xk+1=f(xk,uk),并且初始状态
x
0
x_0
x0是给定的。
1.2 ilqr算法原理
与LQR问题求解过程类似,定义问题的最优状态值函数
J
^
i
(
x
i
)
\hat{J}_{i}(x_{i})
J^i(xi)和动作值函数
J
~
i
(
x
i
,
u
i
)
\tilde{J}_{i}(x_{i}, u_{i})
J~i(xi,ui):
J
^
i
(
x
i
)
=
min
u
i
{
l
(
x
i
,
u
i
)
+
J
^
i
+
1
(
x
i
+
1
)
}
=
min
u
i
{
l
(
x
i
,
u
i
)
+
J
^
i
+
1
(
f
(
x
i
,
u
i
)
)
}
=
min
u
i
J
~
i
(
x
i
,
u
i
)
\begin{align*} \hat{J}_{i}(x_{i}) &= \min_{u_{i}}\left\{ l(x_{i},u_{i}) + \hat{J}_{i+1}(x_{i+1}) \right\} \\ &= \min_{u_{i}}\left\{ l(x_{i},u_{i}) + \hat{J}_{i+1}(f(x_{i},u_{i})) \right\} \\ &= \min_{u_{i}} \tilde{J}_{i}\left( x_{i},u_{i} \right) \tag{3} \end{align*}
J^i(xi)=uimin{l(xi,ui)+J^i+1(xi+1)}=uimin{l(xi,ui)+J^i+1(f(xi,ui))}=uiminJ~i(xi,ui)(3)
下面详细解释(3)是怎么来的:
- 最优状态值函数 J ^ i ( x i ) \hat{J}_{i}(x_{i}) J^i(xi) 定义为给定当前状态 x i x_i xi,从时间步 i i i 到 N N N 的最小总代价,举个例子,现在状态是 x N − 1 x_{N - 1} xN−1,那么可以预期的最优状态值 J ^ N − 1 ( x N − 1 ) \hat{J}_{N - 1}(x_{N - 1}) J^N−1(xN−1) 为当前 N − 1 N-1 N−1 时刻的最小即时代价 l ( x N − 1 , u N − 1 ) l(x_{N-1},u_{N-1}) l(xN−1,uN−1) 加上最小终端代价 J ^ N ( x N ) \hat{J}_{N}(x_{N}) J^N(xN),就像一个人开车从A市去B市,计算最优燃油消耗量Y,那么他最省油的开法肯定是当前时刻采用最省油的开法,并在到达B市之前都一直采用最省油的开法。
- 动作值函数 J ~ i ( x i , u i ) \tilde{J}_{i}\left( x_{i},u_{i} \right) J~i(xi,ui) 定义为在当前状态 x i x_i xi下采取某个控制输入 u i u_i ui并遵循最优策略直到终端状态的总代价,动作值函数为最优需要满足采取的控制输入 u i u_i ui为最优。
- 状态转移:状态
x
i
+
1
x_{i + 1}
xi+1 由当前状态
x
i
x_{i}
xi和
u
i
u_i
ui通过状态转移函数
f
(
x
i
,
u
i
)
f(x_{i},u_{i})
f(xi,ui)决定,因此,可以将当前状态
x
i
x_i
xi的最优状态值函数
J
^
i
(
x
i
)
\hat{J}_{i}(x_{i})
J^i(xi) 表达为:
J ^ i ( x i ) = min u i { l ( x i , u i ) + J ^ i + 1 ( x i + 1 ) } = min u i { l ( x i , u i ) + J ^ i + 1 ( f ( x i , u i ) ) } = min u i J ~ i ( x i , u i ) \begin{align*} \hat{J}_{i}(x_{i}) &= \min_{u_{i}} \left\{ l(x_{i}, u_{i}) + \hat{J}_{i+1}(x_{i+1}) \right\} \\ &= \min_{u_{i}} \left\{ l(x_{i}, u_{i}) + \hat{J}_{i+1}(f(x_{i}, u_{i})) \right\} \\ &= \min_{u_{i}} \tilde{J}_{i}(x_{i}, u_{i}) \tag{4} \end{align*} J^i(xi)=uimin{l(xi,ui)+J^i+1(xi+1)}=uimin{l(xi,ui)+J^i+1(f(xi,ui))}=uiminJ~i(xi,ui)(4) - 动作值函数
J
~
i
(
x
i
,
u
i
)
\tilde{J}_{i}\left( x_{i},u_{i} \right)
J~i(xi,ui) 可以定义为:
J ~ i ( x i , u i ) = l ( x i , u i ) + J ^ i + 1 ( f ( x i , u i ) ) \begin{align*} \tilde{J}_{i}\left( x_{i},u_{i} \right) = l(x_{i},u_{i}) + \hat{J}_{i+1}(f(x_{i},u_{i})) \tag{5} \end{align*} J~i(xi,ui)=l(xi,ui)+J^i+1(f(xi,ui))(5) - 最小化动作值函数:最优状态值函数也可以通过最小化所有可能的动作值函数得到:
J ^ i ( x i ) = min u i J ~ i ( x i , u i ) \begin{align*} \hat{J}_{i}(x_{i}) = \min_{u_{i}} \tilde{J}_{i}(x_{i}, u_{i}) \tag{6} \end{align*} J^i(xi)=uiminJ~i(xi,ui)(6)
坐稳了,下面开始推导,终端最优状态值函数 J ^ N ( x N ) = l f ( x N ) \hat{J}_{N}(x_{N})={l}_{f}(x_{N}) J^N(xN)=lf(xN) ,很好理解,因为到终端状态就没有输入了,所以只有状态代价,没有控制代价。
使用泰勒展开将系统在当前状态 x i x_{i} xi附近线性化,进而转换为LQR问题进行求解:
状态转移函数展开,这里需要了解下非线性系统是如何线性化的,链接: 雅可比矩阵几何意义的直观解释及应用
通过雅可比矩阵得到 A i A_{i} Ai和 B i B_{i} Bi
f ( x i , u i ) + δ f ( x i , u i ) = f ( x i + δ x i , u i + δ u i ) ≈ f ( x i , u i ) + ∂ f ( x i , u i ) ∂ x i δ x i + ∂ f ( x i , u i ) ∂ u i δ u i ≜ f ( x i , u i ) + A i δ x i + B i δ u i \begin{align*} f\left(x_{i}, u_{i}\right)+\delta f\left(x_{i}, u_{i}\right) &= f\left(x_{i}+\delta x_{i}, u_{i}+\delta u_{i}\right) \\ &\approx f\left(x_{i}, u_{i}\right)+\frac{\partial f\left(x_{i}, u_{i}\right)}{\partial x_{i}}\delta x_{i}+\frac{\partial f\left(x_{i}, u_{i}\right)}{\partial u_{i}}\delta u_{i} \\ &\triangleq f\left(x_{i}, u_{i}\right)+A_{i}\delta x_{i}+B_{i}\delta u_{i}\tag{7} \end{align*} f(xi,ui)+δf(xi,ui)=f(xi+δxi,ui+δui)≈f(xi,ui)+∂xi∂f(xi,ui)δxi+∂ui∂f(xi,ui)δui≜f(xi,ui)+Aiδxi+Biδui(7)
状态值函数展开,这里也很好理解,注意这个 p i T p_{i}^T piT和 P i P_{i} Pi,后面backward pass的时候通过更新它们俩来实现状态递推
J ^ i ( x i ) + δ J ^ i ( x i ) = J ^ i ( x i + δ x i ) ≈ J ^ i ( x i ) + ∂ J ^ i ( x i ) ∂ x i δ x i + 1 2 δ x i T ∂ 2 J ^ i ( x i ) ∂ x i 2 δ x i ≜ J ^ i ( x i ) + p i T δ x i + 1 2 δ x i T P i δ x i \begin{align*} \hat{J}_{i}(x_{i}) + \delta \hat{J}_{i}(x_{i}) &= \hat{J}_{i}(x_{i} + \delta x_{i}) \\ &\approx \hat{J}_{i}(x_{i}) + \frac{\partial \hat{J}_{i}(x_{i})}{\partial x_{i}} \delta x_{i} + \frac{1}{2} \delta x_{i}^T \frac{\partial^2 \hat{J}_{i}(x_{i})}{\partial x_{i}^2} \delta x_{i} \\ &\triangleq \hat{J}_{i}(x_{i}) + p_{i}^T \delta x_{i} + \frac{1}{2} \delta x_{i}^T P_{i} \delta x_{i}\tag{8} \end{align*} J^i(xi)+δJ^i(xi)=J^i(xi+δxi)≈J^i(xi)+∂xi∂J^i(xi)δxi+21δxiT∂xi2∂2J^i(xi)δxi≜J^i(xi)+piTδxi+21δxiTPiδxi(8)
动作值函数展开,常规的泰勒一阶展开和二阶展开,没有什么难理解的
J ~ i ( x i , u i ) + δ J ~ i ( x i , u i ) = J ~ i ( x i + δ x i , u i + δ u i ) ≈ J ~ i ( x i , u i ) + ∂ J ~ i ( x i , u i ) ∂ x i δ x i + ∂ J ~ i ( x i , u i ) ∂ u i δ u i + 1 2 δ x i T ∂ 2 J ~ i ( x i , u i ) ∂ x i 2 δ x i + 1 2 δ u i T ∂ 2 J ~ i ( x i , u i ) ∂ u i 2 δ u i + 1 2 δ x i T ∂ 2 J ~ i ( x i , u i ) ∂ x i ∂ u i δ u i + 1 2 δ u i T ∂ 2 J ~ i ( x i , u i ) ∂ u i ∂ x i δ x i ≜ J ~ i ( x i , u i ) + Q x i T δ x i + Q u i T δ u i + 1 2 δ x i T Q x i 2 δ x i + 1 2 δ u i T Q u i 2 δ u i + 1 2 δ x i T Q x i u i δ u i + 1 2 δ u i T Q u i x i T δ x i ( 9 ) \begin{align*} \tilde{J}_{i}\left(x_{i},u_{i}\right)+\delta\tilde{J}_{i}\left(x_{i},u_{i}\right) &= \tilde{J}_{i}\left(x_{i}+\delta x_{i},u_{i}+\delta u_{i}\right) \\ &\approx \tilde{J}_{i}\left(x_{i},u_{i}\right)+\frac{\partial\tilde{J}_{i}\left(x_{i},u_{i}\right)}{\partial x_{i}}\delta x_{i}+\frac{\partial\tilde{J}_{i}\left(x_{i},u_{i}\right)}{\partial u_{i}}\delta u_{i} \\ &\quad+\frac{1}{2}\delta x_{i}^{T}\frac{\partial^{2}\tilde{J}_{i}\left(x_{i},u_{i}\right)}{\partial x_{i}^{2}}\delta x_{i}+\frac{1}{2}\delta u_{i}^{T}\frac{\partial^{2}\tilde{J}_{i}\left(x_{i},u_{i}\right)}{\partial u_{i}^{2}}\delta u_{i} \\ &\quad+\frac{1}{2}\delta x_{i}^{T}\frac{\partial^{2}\tilde{J}_{i}\left(x_{i},u_{i}\right)}{\partial x_{i}\partial u_{i}}\delta u_{i}+\frac{1}{2}\delta u_{i}^{T}\frac{\partial^{2}\tilde{J}_{i}\left(x_{i},u_{i}\right)}{\partial u_{i}\partial x_{i}}\delta x_{i} \\ &\triangleq \tilde{J}_{i}\left(x_{i},u_{i}\right)+Q_{x_{i}}^{T}\delta x_{i}+Q_{u_{i}}^{T}\delta u_{i}+\frac{1}{2}\delta x_{i}^{T}Q_{x_{i}^{2}}\delta x_{i} \\ &\quad+\frac{1}{2}\delta u_{i}^{T}Q_{u_{i}^{2}}\delta u_{i}+\frac{1}{2}\delta x_{i}^{T}Q_{x_{i} u_{i}}\delta u_{i} + \frac{1}{2}\delta u_{i}^{T}Q_{u_{i} x_{i}}^{T}\delta x_{i} \end{align*}\qquad(9) J~i(xi,ui)+δJ~i(xi,ui)=J~i(xi+δxi,ui+δui)≈J~i(xi,ui)+∂xi∂J~i(xi,ui)δxi+∂ui∂J~i(xi,ui)δui+21δxiT∂xi2∂2J~i(xi,ui)δxi+21δuiT∂ui2∂2J~i(xi,ui)δui+21δxiT∂xi∂ui∂2J~i(xi,ui)δui+21δuiT∂ui∂xi∂2J~i(xi,ui)δxi≜J~i(xi,ui)+QxiTδxi+QuiTδui+21δxiTQxi2δxi+21δuiTQui2δui+21δxiTQxiuiδui+21δuiTQuixiTδxi(9)
根据公式(5),对 J ~ ( x , u ) \tilde{J}\left( x,u \right) J~(x,u)做泰勒展开有:
J ~ i ( x i , u i ) + δ J ~ i ( x i , u i ) = J ~ i ( x i + δ x i , u i + δ u i ) = l ( x i + δ x i , u i + δ u i ) + J ^ i + 1 ( f ( x i + δ x i , u i + δ u i ) ) ( 10 ) \begin{align*} \tilde{J}_{i}(x_{i}, u_{i}) + \delta \tilde{J}_{i}(x_{i}, u_{i}) &= \tilde{J}_{i}(x_{i} + \delta x_{i}, u_{i} + \delta u_{i}) \\ &= l(x_{i} + \delta x_{i}, u_{i} + \delta u_{i}) + \hat{J}_{i+1}(f(x_{i} + \delta x_{i}, u_{i} + \delta u_{i})) \end{align*}\qquad(10) J~i(xi,ui)+δJ~i(xi,ui)=J~i(xi+δxi,ui+δui)=l(xi+δxi,ui+δui)+J^i+1(f(xi+δxi,ui+δui))(10)
其中,即时代价展开为:
l ( x i + δ x i , u i + δ u i ) = l ( x i , u i ) + δ l ( x i , u i ) ≈ l ( x i , u i ) + ∂ l ( x i , u i ) ∂ x i δ x i + ∂ l ( x i , u i ) ∂ u i δ u i + 1 2 δ x i T ∂ 2 l ( x i , u i ) ∂ x i 2 δ x i + 1 2 δ u i T ∂ 2 l ( x i , u i ) ∂ u i 2 δ u i + 1 2 δ x i T ∂ 2 l ( x i , u i ) ∂ x i ∂ u i δ u i + 1 2 δ u i T ∂ 2 l ( x i , u i ) ∂ u i ∂ x i δ x i ≜ l ( x i , u i ) + l x i δ x i + l u i δ u i + 1 2 δ x i T l x i 2 δ x i + 1 2 δ u i T l u i 2 δ u i + 1 2 δ x i T l x i u i δ u i + 1 2 δ u i T l u i x i δ x i ( 11 ) \begin{align*} l(x_{i}+\delta x_{i},u_{i}+\delta u_{i}) &= l(x_{i},u_{i})+\delta l(x_{i},u_{i}) \\ &\approx l(x_{i},u_{i})+\frac{\partial l(x_{i},u_{i})}{\partial x_{i}}\delta x_{i}+\frac{\partial l(x_{i},u_{i})}{\partial u_{i}}\delta u_{i} \\ &\quad+\frac{1}{2}\delta x_{i}^{T}\frac{\partial^{2} l(x_{i},u_{i})}{\partial x_{i}^{2}}\delta x_{i}+\frac{1}{2}\delta u_{i}^{T}\frac{\partial^{2} l(x_{i},u_{i})}{\partial u_{i}^{2}}\delta u_{i} \\ &\quad+\frac{1}{2}\delta x_{i}^{T}\frac{\partial^{2} l(x_{i},u_{i})}{\partial x_{i}\partial u_{i}}\delta u_{i}+\frac{1}{2}\delta u_{i}^{T}\frac{\partial^{2} l(x_{i},u_{i})}{\partial u_{i}\partial x_{i}}\delta x_{i} \\ &\triangleq l(x_{i},u_{i})+l_{x_{i}}\delta x_{i}+l_{u_{i}}\delta u_{i}+\frac{1}{2}\delta x_{i}^{T} l_{x_{i}^{2}}\delta x_{i} \\ &\quad+\frac{1}{2}\delta u_{i}^{T} l_{u_{i}^{2}}\delta u_{i}+\frac{1}{2}\delta x_{i}^{T} l_{x_{i} u_{i}}\delta u_{i}+\frac{1}{2}\delta u_{i}^{T} l_{u_{i} x_{i}}\delta x_{i} \end{align*}\qquad(11) l(xi+δxi,ui+δui)=l(xi,ui)+δl(xi,ui)≈l(xi,ui)+∂xi∂l(xi,ui)δxi+∂ui∂l(xi,ui)δui+21δxiT∂xi2∂2l(xi,ui)δxi+21δuiT∂ui2∂2l(xi,ui)δui+21δxiT∂xi∂ui∂2l(xi,ui)δui+21δuiT∂ui∂xi∂2l(xi,ui)δxi≜l(xi,ui)+lxiδxi+luiδui+21δxiTlxi2δxi+21δuiTlui2δui+21δxiTlxiuiδui+21δuiTluixiδxi(11)
下一时刻的状态值函数展开为(这里参考式(8)):
J ^ i + 1 ( f ( x i + δ x i , u i + δ u i ) ) = J ^ i + 1 ( f ( x i , u i ) + δ f ( x i , u i ) ) ≈ J ^ i + 1 ( f ( x i , u i ) ) + ∂ J ^ i + 1 ( x i + 1 ) ∂ x i + 1 δ f ( x i , u i ) + 1 2 δ f ( x i , u i ) T ∂ 2 J ^ i + 1 ( x i + 1 ) ∂ x i + 1 2 δ f ( x i , u i ) = J ^ i + 1 ( f ( x i , u i ) ) + p i + 1 T ( A i δ x i + B i δ u i ) + 1 2 ( A i δ x i + B i δ u i ) T P i + 1 ( A i δ x i + B i δ u i ) = J ^ i + 1 ( f ( x i , u i ) ) + p i + 1 T A i δ x i + p i + 1 T B i δ u i + 1 2 δ x i T A i T P i + 1 A i δ x i + 1 2 δ x i T A i T P i + 1 B i δ u i + 1 2 δ u i T B i T P i + 1 A i δ x i + 1 2 δ u i T B i T P i + 1 B i δ u i ( 12 ) \begin{align*} \hat{J}_{i+1}\left(f\left(x_{i}+\delta x_{i}, u_{i}+\delta u_{i}\right)\right) &= \hat{J}_{i+1}\left(f\left(x_{i}, u_{i}\right)+\delta f\left(x_{i}, u_{i}\right)\right) \\ &\approx \hat{J}_{i+1}\left(f\left(x_{i}, u_{i}\right)\right)+\frac{\partial\hat{J}_{i+1}\left(x_{i+1}\right)}{\partial x_{i+1}}\delta f\left(x_{i}, u_{i}\right) \\ &\quad+\frac{1}{2}\delta f\left(x_{i}, u_{i}\right)^{T}\frac{\partial^{2}\hat{J}_{i+1}\left(x_{i+1}\right)}{\partial x_{i+1}^{2}}\delta f\left(x_{i}, u_{i}\right) \\ &= \hat{J}_{i+1}\left(f\left(x_{i}, u_{i}\right)\right)+p_{i+1}^{T}\left(A_{i}\delta x_{i}+B_{i}\delta u_{i}\right) \\ &\quad+\frac{1}{2}\left(A_{i}\delta x_{i}+B_{i}\delta u_{i}\right)^{T} P_{i+1}\left(A_{i}\delta x_{i}+B_{i}\delta u_{i}\right) \\ &= \hat{J}_{i+1}\left(f\left(x_{i}, u_{i}\right)\right)+p_{i+1}^{T} A_{i}\delta x_{i}+p_{i+1}^{T} B_{i}\delta u_{i} \\ &\quad+\frac{1}{2}\delta x_{i}^{T} A_{i}^{T} P_{i+1} A_{i}\delta x_{i}+\frac{1}{2}\delta x_{i}^{T} A_{i}^{T} P_{i+1} B_{i}\delta u_{i} \\ &\quad+\frac{1}{2}\delta u_{i}^{T} B_{i}^{T} P_{i+1} A_{i}\delta x_{i}+\frac{1}{2}\delta u_{i}^{T} B_{i}^{T} P_{i+1} B_{i}\delta u_{i} \end{align*}\qquad(12) J^i+1(f(xi+δxi,ui+δui))=J^i+1(f(xi,ui)+δf(xi,ui))≈J^i+1(f(xi,ui))+∂xi+1∂J^i+1(xi+1)δf(xi,ui)+21δf(xi,ui)T∂xi+12∂2J^i+1(xi+1)δf(xi,ui)=J^i+1(f(xi,ui))+pi+1T(Aiδxi+Biδui)+21(Aiδxi+Biδui)TPi+1(Aiδxi+Biδui)=J^i+1(f(xi,ui))+pi+1TAiδxi+pi+1TBiδui+21δxiTAiTPi+1Aiδxi+21δxiTAiTPi+1Biδui+21δuiTBiTPi+1Aiδxi+21δuiTBiTPi+1Biδui(12)
其中, p i = ∂ J ^ i ( x i ) ∂ x i p_{i} = \frac{\partial \hat{J}_{i}(x_{i})}{\partial x_{i}} pi=∂xi∂J^i(xi), P i = ∂ 2 J ^ i ( x i ) ∂ x i 2 P_{i} = \frac{\partial^2 \hat{J}_{i}(x_{i})}{\partial x_{i}^2} Pi=∂xi2∂2J^i(xi)。 联合(8)(9)(10)(11)(12)可得 J ~ i ( x i , u i ) \tilde{J}_{i}(x_{i}, u_{i}) J~i(xi,ui)在 ( x i , u i ) (x_{i}, u_{i}) (xi,ui)附近泰勒展开的多项式系数矩阵:
Q x i = l x i + p i + 1 T A i Q u i = l u i + p i + 1 T B i Q x i 2 = l x i 2 + A i T P i + 1 A i Q x i u i = l x i u i + A i T P i + 1 B i Q u i x i = l u i x i + B i T P i + 1 A i Q u i 2 = l u i 2 + B i T P i + 1 B i \begin{align*} Q_{x_{i}} &= l_{x_{i}} + p_{i+1}^{T} A_{i} \\ Q_{u_{i}} &= l_{u_{i}} + p_{i+1}^{T} B_{i} \\ Q_{x_{i}^{2}} &= l_{x_{i}^{2}} + A_{i}^{T} P_{i+1} A_{i} \\ Q_{x_{i} u_{i}} &= l_{x_{i} u_{i}} + A_{i}^{T} P_{i+1} B_{i} \\ Q_{u_{i} x_{i}} &= l_{u_{i} x_{i}} + B_{i}^{T} P_{i+1} A_{i} \\ Q_{u_{i}^{2}} &= l_{u_{i}^{2}} + B_{i}^{T} P_{i+1} B_{i} \tag{13} \end{align*} QxiQuiQxi2QxiuiQuixiQui2=lxi+pi+1TAi=lui+pi+1TBi=lxi2+AiTPi+1Ai=lxiui+AiTPi+1Bi=luixi+BiTPi+1Ai=lui2+BiTPi+1Bi(13)
一般来说有 Q x i u i = Q x i u i T Q_{x_{i} u_{i}}=Q_{x_{i} u_{i}}^{T} Qxiui=QxiuiT,根据式(9),如果当前状态 x i x_{i} xi,输入是 u i u_{i} ui,希望 x i + δ x i x_{i} + \delta x_{i} xi+δxi的代价值函数 J ~ i ( x i + δ x i , u i + δ u i ) \tilde{J}_{i}\left(x_{i}+\delta x_{i},u_{i}+\delta u_{i}\right) J~i(xi+δxi,ui+δui)最小,则需要使 δ J ~ i ( x i , u i ) \delta \tilde{J}_{i}(x_{i}, u_{i}) δJ~i(xi,ui)最小,有:
δ J ^ i ( x i ) = min δ u i δ J ~ i ( x i , u i ) = min δ u i { Q x i T δ x i + Q u i T δ u i + 1 2 δ x i T Q x i 2 δ x i + 1 2 δ u i T Q u i 2 δ u i + 1 2 δ x i T Q x i u i δ u i + 1 2 δ u i T Q u i x i T δ x i } = min δ u i δ J ~ i ( x i , u i ) ( 14 ) \begin{align*} \delta \hat{J}_i(x_i) &= \min_{\delta u_i} \delta \tilde{J}_i(x_i, u_i) \\ &= \min_{\delta u_i} \left\{ Q_{x_i}^T \delta x_i + Q_{u_i}^T \delta u_i + \frac{1}{2} \delta x_i^T Q_{x_i^2} \delta x_i + \frac{1}{2} \delta u_i^T Q_{u_i^2} \delta u_i \right. \\ &\quad \left. + \frac{1}{2} \delta x_i^T Q_{x_i u_i} \delta u_i + \frac{1}{2} \delta u_i^T Q_{u_i x_i}^T \delta x_i \right\} \\ &= \min_{\delta u_i} \delta \tilde{J}_i(x_i, u_i) \end{align*} \qquad(14) δJ^i(xi)=δuiminδJ~i(xi,ui)=δuimin{QxiTδxi+QuiTδui+21δxiTQxi2δxi+21δuiTQui2δui+21δxiTQxiuiδui+21δuiTQuixiTδxi}=δuiminδJ~i(xi,ui)(14)
δ J ~ i ( x i , u i ) \delta \tilde{J}_{i}(x_{i}, u_{i}) δJ~i(xi,ui)对 δ u i \delta u_{i} δui求偏导,使其取得最小值得到 δ J ^ i ( x i ) \delta \hat{J}_i(x_i) δJ^i(xi),并让其等于0有:
∂ δ J ~ i ( x i ) ∂ δ u i = Q u i + 1 2 Q u i x i T δ x i + Q u i 2 δ u i + 1 2 Q x i u i δ x i = Q u i + Q u i x i δ x i + Q u i 2 δ u i = 0 ( 15 ) \begin{align*} \frac{\partial \delta \tilde{J}_i(x_i)}{\partial \delta u_i} &= Q_{u_i} + \frac{1}{2} Q_{u_i x_i}^T \delta x_i + Q_{u_i^2} \delta u_i + \frac{1}{2} Q_{x_i u_i} \delta x_i \\ &= Q_{u_i} + Q_{u_i x_i} \delta x_i + Q_{u_i^2} \delta u_i = 0 \end{align*} \qquad(15) ∂δui∂δJ~i(xi)=Qui+21QuixiTδxi+Qui2δui+21Qxiuiδxi=Qui+Quixiδxi+Qui2δui=0(15)
解出来在 i i i时刻的最优控制输入的变化量:
δ u ^ i ∗ = − Q u i 2 − 1 ( Q u i + Q u i x i δ x i ) ≜ K i δ x i + d i ( 16 ) \begin{align*} \delta \hat{u}_i^* &= -Q_{u_i^2}^{-1} \left( Q_{u_i} + Q_{u_i x_i} \delta x_i \right) \\ &\triangleq K_i \delta x_i + d_i \end{align*} \qquad(16) δu^i∗=−Qui2−1(Qui+Quixiδxi)≜Kiδxi+di(16)
iLQR的核心思想就是在给定初解 u i u_i ui的基础上,求最优的补偿输入 δ u ^ i ∗ \delta \hat{u}_i^* δu^i∗ 使得在 u i + δ u ^ i ∗ u_i+\delta \hat{u}_i^* ui+δu^i∗的作用下获得更优的解,并循环这个过程。
K i K_i Ki为反馈增益, d i d_i di为前馈增益。将式(16)代入到式(9)整理得:
δ J ~ i ( x i , u i ∗ ) = Q x i T δ x i + Q u i T ( K i δ x i + d i ) + 1 2 δ x i T Q x i 2 δ x i + 1 2 ( K i δ x i + d i ) T Q u i 2 ( K i δ x i + d i ) + 1 2 δ x i T Q x i u i ( K i δ x i + d i ) + 1 2 ( K i δ x i + d i ) T Q u i x i δ x i = 1 2 δ x i T [ Q x i 2 + K i T Q u i 2 K i + Q x i u i K i + K i T Q u i x i ] δ x i + [ Q x i + K i T Q u i 2 d i + Q x i u i d i + K i T Q u i d i ] T δ x i + 1 2 d i T Q u i 2 d i + Q u i T d i ( 17 ) \begin{align*} \delta \tilde{J}_i(x_i, u_i^*) &=Q_{x_i}^T \delta x_i + Q_{u_i}^T (K_i \delta x_i + d_i) + \frac{1}{2} \delta x_i^T Q_{x_i^2} \delta x_i \\ &\quad+ \frac{1}{2} (K_i \delta x_i + d_i)^T Q_{u_i^2} (K_i \delta x_i + d_i) \\ &\quad + \frac{1}{2} \delta x_i^T Q_{x_i u_i} (K_i \delta x_i + d_i) + \frac{1}{2} (K_i \delta x_i + d_i)^T Q_{u_i x_i} \delta x_i \\ &= \frac{1}{2} \delta x_i^T [Q_{x_i^2} + K_i^T Q_{u_i^2} K_i + Q_{x_i u_i} K_i + K_i^T Q_{u_i x_i}] \delta x_i \\ &\quad + [Q_{x_i} + K_i^T Q_{u_i^2} d_i + Q_{x_i u_i} d_i + K_i^T Q_{u_i} d_i]^T \delta x_i \\ &\quad + \frac{1}{2} d_i^T Q_{u_i^2} d_i + Q_{u_i}^T d_i \end{align*} \qquad(17) δJ~i(xi,ui∗)=QxiTδxi+QuiT(Kiδxi+di)+21δxiTQxi2δxi+21(Kiδxi+di)TQui2(Kiδxi+di)+21δxiTQxiui(Kiδxi+di)+21(Kiδxi+di)TQuixiδxi=21δxiT[Qxi2+KiTQui2Ki+QxiuiKi+KiTQuixi]δxi+[Qxi+KiTQui2di+Qxiuidi+KiTQuidi]Tδxi+21diTQui2di+QuiTdi(17)
这里需要注意的是 δ J ~ i ( x i , u i ∗ ) = δ J ^ i ( x i ) \delta \tilde{J}_i(x_i, u_i^*)=\delta \hat{J}_i(x_i) δJ~i(xi,ui∗)=δJ^i(xi),结合式(8)式(17)可以得到 p i p_{i} pi和 P i P_{i} Pi的递推关系式:
p i = Q x i + K i T Q u i 2 d i + Q x i u i d i + K i T Q u P i = Q x i 2 + K i T Q u i 2 K i + Q x i u i K i + K i T Q u i x i Δ J ^ i = 1 2 d i T Q u i 2 d i + Q u i T d i ( 18 ) \begin{align*} p_i &= Q_{x_i} + K_i^T Q_{u_i^2} d_i + Q_{x_i u_i} d_i + K_i^T Q_u \\ P_i &= Q_{x_i^2} + K_i^T Q_{u_i^2} K_i + Q_{x_i u_i} K_i + K_i^T Q_{u_i x_i} \\ \Delta \hat{J}_i &= \frac{1}{2} d_i^T Q_{u_i^2} d_i + Q_{u_i}^T d_i \end{align*} \qquad(18) piPiΔJ^i=Qxi+KiTQui2di+Qxiuidi+KiTQu=Qxi2+KiTQui2Ki+QxiuiKi+KiTQuixi=21diTQui2di+QuiTdi(18)
其中, p N T = ∂ l f ( x N ) ∂ x N p_{N}^T = \frac{\partial l_f(x_{N})}{\partial x_{N}} pNT=∂xN∂lf(xN), P N = ∂ 2 l f ( x N ) ∂ x N 2 P_{N} = \frac{\partial^2 l_f(x_{N})}{\partial x_{N}^2} PN=∂xN2∂2lf(xN); Δ J ^ i \Delta \hat{J}_i ΔJ^i是 J ^ i ( x i ) \hat{J}_i(x_i) J^i(xi)二阶泰勒展开的高阶无穷小项,与 δ x \delta x δx无关,不影响 δ u \delta u δu的计算。
从上面的推导可以看出,如果确定 p N T p_{N}^T pNT和 P N P_{N} PN,就可以通过式(18)来递推得到 p N − 1 T p_{N-1}^T pN−1T和 P N − 1 P_{N-1} PN−1,之所以有backward pass,就是为了从 N − 1 N-1 N−1时刻,一直倒推到当前时刻,求出每个时刻的最优控制输入的变化量。
1.3 ilqr算法迭代过程
- Backward pass
初始化 p N T = ∂ l f ( x N ) ∂ x N p_{N}^T = \frac{\partial l_f(x_{N})}{\partial x_{N}} pNT=∂xN∂lf(xN), P N = ∂ 2 l f ( x N ) ∂ x N 2 P_{N} = \frac{\partial^2 l_f(x_{N})}{\partial x_{N}^2} PN=∂xN2∂2lf(xN);
f o r for for i = N − 1 , . . . , 0 i = N-1,...,0 i=N−1,...,0
根据式(13)式(18)更新 p i ← p i + 1 p_{i} \leftarrow p_{i+1} pi←pi+1, P i ← P i + 1 P_{i} \leftarrow P_{i+1} Pi←Pi+1
根据式(16)计算 K i = Q u i 2 − 1 Q u i x i K_i = Q_{u_i^2}^{-1} Q_{u_i x_i} Ki=Qui2−1Quixi, d i = Q u i 2 − 1 Q u i d_i = Q_{u_i^2}^{-1} Q_{u_i} di=Qui2−1Qui - Forward pass
初始条件 x 0 x_{0} x0,对于任意迭代次数 j j j有 x 0 j = x 0 j − 1 x_{0}^j = x_{0}^{j-1} x0j=x0j−1,这里也好理解,每次迭代都是基于当前时刻的状态,所以不会变
f o r for for i = 0 , . . . , N − 1 i = 0,...,N-1 i=0,...,N−1
δ x i = x i j − x i j − 1 \delta x_i = x_i^j - x_i^{j-1} δxi=xij−xij−1, δ x i \delta x_i δxi的含义就是当前轮迭代解和上一轮迭代解在 i i i时刻的差值(是不是有牛顿法那味了)
δ u ^ i ∗ = K i δ x i + d i \delta \hat{u}_i^* = K_i \delta x_i + d_i δu^i∗=Kiδxi+di,更新控制输入 u i j = u i j − 1 + δ u ^ i ∗ u_{i}^j = u_{i}^{j-1} + \delta\hat{u}_i^* uij=uij−1+δu^i∗
更新状态 x i + 1 j = f ( x i j , u i j ) x_{i+1}^j = f(x_{i}^{j},u_{i}^{j}) xi+1j=f(xij,uij) - 满足 ∣ J ( j + 1 ) − J ( j ) J ( j ) ∣ < ϵ \left| \frac{J^{(j+1)} - J^{(j)}}{J^{(j)}} \right| < \epsilon J(j)J(j+1)−J(j) <ϵ或者迭代达到最大轮次终止迭代。
二. ilqr实践代码
参考这位博主: 最优化理论与自动驾驶(十一):基于iLQR的自动驾驶轨迹跟踪算法(c++和python版本)