论文阅读笔记:Denoising Diffusion Implicit Models
1、参考来源
论文《Denoising Diffusion Implicit Models》
来源:ICLR2021
https://iclr.cc/virtual/2021/poster/2804
论文链接:https://arxiv.org/abs/2010.02502
代码链接:https://github.com/ermongroup/ddim
2、符号表示的不同
在论文DDPM《Denoising Diffusion Implicit Models》当中,前向传播过程的
q
(
x
t
−
1
∣
x
t
,
x
0
)
∼
N
(
x
t
−
1
;
μ
~
t
(
x
t
,
x
0
)
,
σ
t
)
q(x_{t-1}|x_t,x_0)\sim N\big(x_{t-1};\widetilde{\mu}_t(x_t,x_0),\sigma_t\big)
q(xt−1∣xt,x0)∼N(xt−1;μ
t(xt,x0),σt)。并且
μ
~
t
(
x
t
,
x
0
)
和
σ
t
\widetilde{\mu}_t(x_t,x_0)和\sigma_t
μ
t(xt,x0)和σt分别如公式(1)所示。
σ
t
=
β
t
⋅
(
1
−
α
t
−
1
ˉ
)
(
1
−
α
t
ˉ
)
μ
~
t
(
x
t
,
x
0
)
=
α
t
⋅
(
1
−
α
t
−
1
ˉ
)
1
−
α
t
ˉ
⋅
x
t
+
β
t
⋅
α
t
−
1
ˉ
1
−
α
t
ˉ
⋅
x
0
\begin{equation} \begin{split} \sigma_t&=\sqrt{\frac{\beta_t\cdot (1-\bar{\alpha_{t-1}})}{(1-\bar{\alpha_{t}})}}\\ \widetilde{\mu}_t(x_t,x_0)&=\frac{\sqrt{\alpha_t}\cdot(1-\bar{\alpha_{t-1}})}{1-\bar{\alpha_t}}\cdot x_t+\frac{\beta_t\cdot \sqrt{\bar{\alpha_{t-1}}}}{1-\bar{\alpha_t}} \cdot x_0 \\ \end{split} \end{equation}
σtμ
t(xt,x0)=(1−αtˉ)βt⋅(1−αt−1ˉ)=1−αtˉαt⋅(1−αt−1ˉ)⋅xt+1−αtˉβt⋅αt−1ˉ⋅x0
在DDIM《Denoising Diffusion Implicit Models》中对符号进行了重新定义。具体来说使用
α
t
\alpha_t
αt替换掉了
α
ˉ
t
\bar\alpha_t
αˉt,而在DDPM当中
α
ˉ
t
=
∏
0
t
α
i
\begin{equation} \begin{split} \bar \alpha_t=\prod_{0}^{t}\alpha_i \end{split} \end{equation}
αˉt=0∏tαi
因此,在DDIM中会发生一些变化,例如
β
t
\beta_t
βt的改变如公式(3)所示。
β
t
=
1
−
α
t
(
D
D
P
M
)
=
1
−
α
t
α
t
−
1
(
D
D
I
M
)
\begin{equation} \begin{split} \beta_t&=1-\alpha_t (DDPM)\\ &=1-\frac{\alpha_t}{\alpha_{t-1}} (DDIM)\\ \end{split} \end{equation}
βt=1−αt(DDPM)=1−αt−1αt(DDIM)
前向加噪过程中的
q
(
x
t
−
1
∣
x
t
,
x
0
)
q(x_{t-1}|x_t,x_0)
q(xt−1∣xt,x0)分布的方差和均值分别如公式(4)和(5)所示。
σ
t
2
=
1
−
α
ˉ
t
−
1
1
−
α
t
ˉ
⋅
β
t
(
D
D
P
M
)
=
1
−
α
t
−
1
1
−
α
t
⋅
(
1
−
α
t
α
t
−
1
)
(
D
D
I
M
)
\begin{equation} \begin{split} \sigma_t^2&=\frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha_t}}\cdot \beta_t(DDPM)\\ &=\frac{1-\alpha_{t-1}}{1-\alpha_t}\cdot (1-\frac{\alpha_t}{\alpha_{t-1}}) (DDIM) \end{split} \end{equation}
σt2=1−αtˉ1−αˉt−1⋅βt(DDPM)=1−αt1−αt−1⋅(1−αt−1αt)(DDIM)
μ
~
t
(
x
t
,
x
0
)
=
α
t
⋅
(
1
−
α
ˉ
t
−
1
)
1
−
α
t
ˉ
⋅
x
t
+
β
t
⋅
α
ˉ
t
−
1
1
−
α
t
ˉ
⋅
x
0
(
D
D
P
M
)
=
α
t
⋅
(
1
−
α
t
−
1
)
α
t
−
1
⋅
(
1
−
α
t
)
⋅
x
t
+
(
1
−
α
t
α
t
−
1
)
⋅
α
t
−
1
1
−
α
t
⋅
x
0
(
D
D
I
M
)
=
α
t
⋅
(
1
−
α
t
−
1
)
2
α
t
−
1
⋅
(
1
−
α
t
)
2
⋅
x
t
+
α
t
−
1
−
α
t
α
t
−
1
⋅
α
t
−
1
1
−
α
t
⋅
x
0
=
1
−
α
t
−
1
1
−
α
t
⋅
α
t
−
α
t
⋅
α
t
−
1
α
t
−
1
−
α
t
−
1
⋅
α
t
⋅
x
t
+
α
t
−
1
−
α
t
α
t
−
1
⋅
(
1
−
α
t
)
⋅
x
0
=
1
−
α
t
−
1
1
−
α
t
⋅
α
t
+
α
t
−
1
−
α
t
−
1
−
α
t
⋅
α
t
−
1
α
t
−
1
−
α
t
−
1
⋅
α
t
⋅
x
t
+
α
t
−
1
−
α
t
⋅
α
t
−
1
+
α
t
⋅
α
t
−
1
−
α
t
α
t
−
1
⋅
(
1
−
α
t
)
⋅
x
0
=
1
−
α
t
−
1
1
−
α
t
⋅
(
1
+
α
t
−
α
t
−
1
α
t
−
1
−
α
t
−
1
⋅
α
t
)
⋅
x
t
+
α
t
−
1
⋅
(
1
−
α
t
)
−
α
t
⋅
(
1
−
α
t
−
1
)
α
t
−
1
⋅
(
1
−
α
t
)
⋅
x
0
=
1
−
α
t
−
1
1
−
α
t
⋅
(
1
−
α
t
−
1
−
α
t
α
t
−
1
−
α
t
−
1
⋅
α
t
)
⋅
x
t
+
[
α
t
−
1
−
α
t
⋅
(
1
−
α
t
−
1
)
α
t
−
1
⋅
(
1
−
α
t
)
]
⋅
x
0
=
1
1
−
α
t
⋅
(
1
−
α
t
−
1
−
(
α
t
−
1
−
α
t
)
⋅
(
1
−
α
t
−
1
)
α
t
−
1
−
α
t
−
1
⋅
α
t
)
⋅
x
t
+
[
α
t
−
1
−
α
t
2
⋅
(
1
−
α
t
−
1
)
2
α
t
−
1
⋅
(
1
−
α
t
)
]
⋅
x
0
=
1
1
−
α
t
⋅
(
1
−
α
t
−
1
−
(
α
t
−
1
−
α
t
)
⋅
(
1
−
α
t
−
1
)
α
t
−
1
⋅
(
1
−
α
t
)
⏟
=
σ
t
2
)
⋅
x
t
+
[
α
t
−
1
−
α
t
⋅
(
1
−
α
t
−
1
)
⋅
(
α
t
−
α
t
⋅
α
t
−
1
)
α
t
−
1
⋅
(
1
−
α
t
)
]
⋅
x
0
=
1
1
−
α
t
⋅
(
1
−
α
t
−
1
−
σ
t
2
)
⋅
x
t
+
[
α
t
−
1
−
α
t
⋅
(
1
−
α
t
−
1
)
⋅
(
α
t
+
α
t
−
1
−
α
t
−
1
−
α
t
⋅
α
t
−
1
)
α
t
−
1
⋅
(
1
−
α
t
)
⋅
(
1
−
α
t
)
]
⋅
x
0
=
1
−
α
t
−
1
−
σ
t
2
1
−
α
t
⋅
x
t
+
[
α
t
−
1
−
1
−
α
t
−
1
1
−
α
t
⋅
α
t
⋅
(
α
t
−
α
t
−
1
+
α
t
−
1
⋅
(
1
−
α
t
)
)
α
t
−
1
⋅
(
1
−
α
t
)
]
⋅
x
0
=
1
−
α
t
−
1
−
σ
t
2
1
−
α
t
⋅
x
t
+
[
α
t
−
1
−
1
−
α
t
−
1
1
−
α
t
⋅
α
t
⋅
(
α
t
−
α
t
−
1
+
α
t
−
1
⋅
(
1
−
α
t
)
)
α
t
−
1
⋅
(
1
−
α
t
)
]
⋅
x
0
=
1
−
α
t
−
1
−
σ
t
2
1
−
α
t
⋅
x
t
+
[
α
t
−
1
−
1
−
α
t
−
1
1
−
α
t
⋅
α
t
⋅
(
1
+
α
t
−
α
t
−
1
α
t
−
1
⋅
(
1
−
α
t
)
)
]
⋅
x
0
=
1
−
α
t
−
1
−
σ
t
2
1
−
α
t
⋅
x
t
+
[
α
t
−
1
−
1
1
−
α
t
⋅
(
1
−
α
t
−
1
)
⋅
α
t
⋅
(
1
−
α
t
−
1
−
α
t
α
t
−
1
⋅
(
1
−
α
t
)
)
]
⋅
x
0
=
1
−
α
t
−
1
−
σ
t
2
1
−
α
t
⋅
x
t
+
[
α
t
−
1
−
1
1
−
α
t
⋅
α
t
⋅
(
1
−
α
t
−
1
−
(
α
t
−
1
−
α
t
)
⋅
(
1
−
α
t
−
1
)
α
t
−
1
⋅
(
1
−
α
t
)
⏟
σ
t
2
)
]
⋅
x
0
=
1
−
α
t
−
1
−
σ
t
2
1
−
α
t
⋅
x
t
+
[
α
t
−
1
−
1
1
−
α
t
⋅
α
t
⋅
(
1
−
α
t
−
1
−
σ
t
2
)
]
⋅
x
0
=
1
−
α
t
−
1
−
σ
t
2
1
−
α
t
⋅
x
t
+
[
α
t
−
1
−
α
t
⋅
(
1
−
α
t
−
1
−
σ
t
2
)
1
−
α
t
]
⋅
x
0
\begin{equation} \begin{split} \widetilde{\mu}_t(x_t,x_0)&=\frac{\sqrt{\alpha_t}\cdot(1-\bar\alpha_{t-1})}{1-\bar{\alpha_t}}\cdot x_t+\frac{\beta_t\cdot \sqrt{\bar\alpha_{t-1}}}{1-\bar{\alpha_t}} \cdot x_0 (DDPM)\\ &=\frac{\sqrt{\alpha_t}\cdot(1-\alpha_{t-1})}{\sqrt{\alpha_{t-1}}\cdot(1-\alpha_t)}\cdot x_t+(1-\frac{\alpha_t}{\alpha_{t-1}})\cdot\frac{\sqrt{\alpha_{t-1}}}{1-\alpha_t}\cdot x_0 (DDIM)\\ &= \sqrt{\frac{\alpha_t\cdot (1-\alpha_{t-1})^2}{\alpha_{t-1} \cdot (1-\alpha_t)^2}}\cdot x_t+\frac{\alpha_{t-1}-\alpha_t}{\alpha_{t-1}}\cdot\frac{\sqrt{\alpha_{t-1}}}{1-\alpha_t}\cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}}{1-\alpha_{t}}\cdot \frac{\alpha_t-\alpha_t \cdot \alpha_{t-1}}{\alpha_{t-1}-\alpha_{t-1}\cdot \alpha_{t}}} \cdot x_t+\frac{\alpha_{t-1}-\alpha_t}{\sqrt{ \alpha_{t-1}}\cdot (1-\alpha_t)}\cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}}{1-\alpha_{t}}\cdot \frac{\alpha_t+\alpha_{t-1}-\alpha_{t-1}-\alpha_t \cdot \alpha_{t-1}}{\alpha_{t-1}-\alpha_{t-1}\cdot \alpha_{t}}} \cdot x_t+\frac{\alpha_{t-1}-\alpha_t\cdot \alpha_{t-1}+\alpha_t\cdot \alpha_{t-1}-\alpha_t}{\sqrt{ \alpha_{t-1}}\cdot (1-\alpha_t)}\cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}}{1-\alpha_{t}}\cdot \Big(1+\frac{\alpha_t-\alpha_{t-1}}{\alpha_{t-1}-\alpha_{t-1}\cdot \alpha_{t}}\Big)}\cdot x_t+\frac{\alpha_{t-1}\cdot (1-\alpha_t)-\alpha_t\cdot (1-\alpha_{t-1})}{\sqrt{ \alpha_{t-1}}\cdot (1-\alpha_t)}\cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}}{1-\alpha_{t}}\cdot \Big(1-\frac{\alpha_{t-1}-\alpha_t}{\alpha_{t-1}-\alpha_{t-1}\cdot \alpha_{t}}\Big)}\cdot x_t+ \bigg[ \sqrt{\alpha_{t-1}}-\frac{\alpha_t\cdot (1-\alpha_{t-1})}{\sqrt{ \alpha_{t-1}}\cdot (1-\alpha_t)} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1}{1-\alpha_{t}}\cdot \Big(1-\alpha_{t-1}-\frac{(\alpha_{t-1}-\alpha_t)\cdot (1-\alpha_{t-1})}{\alpha_{t-1}-\alpha_{t-1}\cdot \alpha_{t}}\Big)}\cdot x_t+ \bigg[ \sqrt{\alpha_{t-1}}-\frac{\sqrt{\alpha_t^2\cdot (1-\alpha_{t-1})^2}}{\sqrt{ \alpha_{t-1}}\cdot (1-\alpha_t)} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1}{1-\alpha_{t}}\cdot \Big(1-\alpha_{t-1}-\underbrace{\frac{(\alpha_{t-1}-\alpha_t)\cdot (1-\alpha_{t-1})}{\alpha_{t-1}\cdot (1- \alpha_{t})}}_{=\sigma_t^2}\Big)}\cdot x_t+ \bigg[ \sqrt{\alpha_{t-1}}-\frac{\sqrt{\alpha_t\cdot (1-\alpha_{t-1})\cdot(\alpha_t-\alpha_t\cdot \alpha_{t-1})}}{\sqrt{ \alpha_{t-1}}\cdot (1-\alpha_t)} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1}{1-\alpha_{t}}\cdot \Big(1-\alpha_{t-1}-\sigma_t^2 \Big)}\cdot x_t+ \bigg[ \sqrt{\alpha_{t-1}}-\frac{\sqrt{\alpha_t\cdot (1-\alpha_{t-1})\cdot(\alpha_t + \alpha_{t-1} -\alpha_{t-1}-\alpha_t\cdot \alpha_{t-1})}}{\sqrt{ \alpha_{t-1}\cdot(1-\alpha_t)}\cdot (\sqrt{1-\alpha_t})} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{1-\alpha_{t-1}}}{\sqrt{1-\alpha_t}} \cdot \frac{ \sqrt{ \alpha_t \cdot \big(\alpha_t-\alpha_{t-1}+\alpha_{t-1}\cdot(1-\alpha_t)\big)}}{\sqrt{ \alpha_{t-1}\cdot(1-\alpha_t)}} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{1-\alpha_{t-1}}}{\sqrt{1-\alpha_t}} \cdot \sqrt{ \frac{ \alpha_t \cdot \big(\alpha_t-\alpha_{t-1}+\alpha_{t-1}\cdot(1-\alpha_t)\big)}{\alpha_{t-1}\cdot(1-\alpha_t)}} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{1-\alpha_{t-1}}}{\sqrt{1-\alpha_t}} \cdot \sqrt{ \alpha_t\cdot \Big(1+\frac{ \alpha_t-\alpha_{t-1}}{\alpha_{t-1}\cdot(1-\alpha_t)}} \Big)\bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{1}{\sqrt{1-\alpha_t}} \cdot \sqrt{ (1-\alpha_{t-1}) \cdot \alpha_t\cdot \Big(1-\frac{\alpha_{t-1} - \alpha_t}{\alpha_{t-1}\cdot(1-\alpha_t)}} \Big)\bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{1}{\sqrt{1-\alpha_t}} \cdot \sqrt{ \alpha_t\cdot \Big(1-\alpha_{t-1}-\underbrace{ \frac{(\alpha_{t-1} - \alpha_t)\cdot (1-\alpha_{t-1})}{\alpha_{t-1}\cdot(1-\alpha_t)}}_{\sigma_t^2}} \Big)\bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{1}{\sqrt{1-\alpha_t}} \cdot \sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )\bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 \\ \end{split} \end{equation}
μ
t(xt,x0)=1−αtˉαt⋅(1−αˉt−1)⋅xt+1−αtˉβt⋅αˉt−1⋅x0(DDPM)=αt−1⋅(1−αt)αt⋅(1−αt−1)⋅xt+(1−αt−1αt)⋅1−αtαt−1⋅x0(DDIM)=αt−1⋅(1−αt)2αt⋅(1−αt−1)2⋅xt+αt−1αt−1−αt⋅1−αtαt−1⋅x0=1−αt1−αt−1⋅αt−1−αt−1⋅αtαt−αt⋅αt−1⋅xt+αt−1⋅(1−αt)αt−1−αt⋅x0=1−αt1−αt−1⋅αt−1−αt−1⋅αtαt+αt−1−αt−1−αt⋅αt−1⋅xt+αt−1⋅(1−αt)αt−1−αt⋅αt−1+αt⋅αt−1−αt⋅x0=1−αt1−αt−1⋅(1+αt−1−αt−1⋅αtαt−αt−1)⋅xt+αt−1⋅(1−αt)αt−1⋅(1−αt)−αt⋅(1−αt−1)⋅x0=1−αt1−αt−1⋅(1−αt−1−αt−1⋅αtαt−1−αt)⋅xt+[αt−1−αt−1⋅(1−αt)αt⋅(1−αt−1)]⋅x0=1−αt1⋅(1−αt−1−αt−1−αt−1⋅αt(αt−1−αt)⋅(1−αt−1))⋅xt+[αt−1−αt−1⋅(1−αt)αt2⋅(1−αt−1)2]⋅x0=1−αt1⋅(1−αt−1−=σt2
αt−1⋅(1−αt)(αt−1−αt)⋅(1−αt−1))⋅xt+[αt−1−αt−1⋅(1−αt)αt⋅(1−αt−1)⋅(αt−αt⋅αt−1)]⋅x0=1−αt1⋅(1−αt−1−σt2)⋅xt+[αt−1−αt−1⋅(1−αt)⋅(1−αt)αt⋅(1−αt−1)⋅(αt+αt−1−αt−1−αt⋅αt−1)]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αt1−αt−1⋅αt−1⋅(1−αt)αt⋅(αt−αt−1+αt−1⋅(1−αt))]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αt1−αt−1⋅αt−1⋅(1−αt)αt⋅(αt−αt−1+αt−1⋅(1−αt))]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αt1−αt−1⋅αt⋅(1+αt−1⋅(1−αt)αt−αt−1)]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αt1⋅(1−αt−1)⋅αt⋅(1−αt−1⋅(1−αt)αt−1−αt)]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αt1⋅αt⋅(1−αt−1−σt2
αt−1⋅(1−αt)(αt−1−αt)⋅(1−αt−1))]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αt1⋅αt⋅(1−αt−1−σt2)]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0
因此,前向传播过程中的 q ( x t − 1 ∣ x t , x 0 ) ∼ N ( x t − 1 ; 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 , σ t 2 I ) q(x_{t-1}|x_t,x_0)\sim N(x_{t-1};\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0,\sigma_t^2 I) q(xt−1∣xt,x0)∼N(xt−1;1−αt1−αt−1−σt2⋅xt+[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0,σt2I)