当前位置：首页 > article >正文

【PyTorch][chapter 28][李宏毅深度学习][Diffusion Model-3]

article 2025/2/26 20:22:32

前言：

生成模型里面发展： AE-> VAE-> GAN ->WGAN -> Diffusiong

本篇我们重点是推导一下Diffusion 模型用的3个公式：

下面红色的是用到了VAE重采样的原理

论文地址： https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf

目录：

Forward Diffusion Process 原理
贝叶斯公式
Reverse diffusion process 原理
代码实现

一 Forward Process: Diffusion 原理

1.1 学习目的

掌握下面2个公式的原理：

$x_t=\sqrt{1-\beta_t}x_{t-1} +\sqrt{1-\beta_t}\epsilon_t$

$x_t=\sqrt{\bar{\alpha_t}}x_0+\sqrt{(1-\bar{\alpha_t})}\epsilon$

其中：

$\beta_t$ : 超参数,递增的超参数 0.02 至 0.0001 之间 $\beta_t=\beta_1+\beta_T *\frac{t-1}{T}$

$\alpha_t=1-\beta_t$

$\bar{\alpha_t}=\prod_{i=0}^{t}\alpha_i$

1.2 高斯分布回顾

本节从一个服从高斯分布的随机变量的背景知识开始,我们表示概率密度函数如下：

   $u$ : 均值

   $\sigma^2$ : 方差

如果两个变量 $X \sim N(u_X,\sigma_X^2)$ 且 $Y \sim N(u_Y,\sigma_Y^2)$ 是独立的正太随机变量,则



   $U=X+Y \sim N(u_X+u_Y,\sigma_X^2+\sigma_Y^2)$ 也服从正太分布

1.3 扩散模型

Diffusion 过程是一个逐步添加噪声的过程,每一步的图片是由前一时刻的图片和当前的噪声生成的

上面过程可以看作是一个马尔科夫链过程，其转移概率通过下面公式表达

$q(x_t|x_{t-1})=N(x_t;\sqrt{1-\beta_t}x_{t-1},\beta_tI)...(3)$

$\beta_1<\beta_t<...<\beta_T$

$\sqrt{1-\beta_t}x_{t-1}$ : 表示均值,数学期望

$\beta_tI$ : 表示方差

1.4 改进方案

通过公式（2）扩散效率太慢了，能否由根据 $x_0,\epsilon_0$ 得到任意时刻的图像，而不是每一步都生成一个噪声。

从转移概率的角度来看（8）表明

$q(x_t|x_0)=N(x_t;\sqrt{\bar{\alpha_t}}x_0,(1-\alpha_t)I)..(9)$

二贝叶斯公式

后面Backwards 过程要应用到下面的贝叶斯公式（3），这边简单回顾一下。

三 Reverse diffusion process

3.1 学习目的：

在Reverse Process,我们重点是掌握下面的公式

$x_{t-1}=\frac{1}{\sqrt{\alpha_t}}(x_t-\frac{1-\alpha_t}{ \sqrt{1-\bar{\alpha_t}}}\epsilon)+\sigma_t Z$

$z \sim N(0,I)$

$x_0=\frac{1}{\sqrt{\bar{\alpha_t}}}(x_t-\sqrt{1-\bar{\alpha_t}}\epsilon)$

其中：

$\epsilon$ 代表噪声，通过模型训练出来的 $\epsilon_{\theta}= f(x_t)$

3.2 模型

$x_t$ : T 时刻包含噪声的图片,已知变量

$x_0$ ：初始时刻的图片,未知变量

$x_{t-1}$ : ：待求解上一时刻的变量

我们现在考虑逆过程。我们感兴趣的是在给定 $x_t$ 和 $x_0$ 的条件下, $x_{t-1}$ 的概率；即， $q(x_{t-1}|x_t,x_0)$ 。为了继续分析，我们将利用上面贝叶斯公式,可以得到.

根据正向过程的马尔可夫性质(当前状态只与前一时刻有关系

$q(x_t|x_{t-1},x_0)=q(x_t|x_{t-1})$

根据公式（3),（9）,最后我们得到下面3个式子

最后我们得到公式

我们展开上面公式，合并其中 $x_{t-1}$ 的项：

其中 $c(x_t,x_0)$ 是最后一项不包含 $x_{t-1}$ 可以认为是常数

其中 $C(x_t,x_0)$ 是常数项不影响结果，

根据

可以配方出

$\frac{1}{\sigma^2}=\frac{\alpha_t}{\beta_t}+\frac{1}{1-\bar{\alpha_{t-1}}}$ ....(14)

$\frac{u}{\sigma^2}=\frac{\sqrt{\alpha_t}}{\beta_t}x_t+\frac{\sqrt{\bar{\alpha_{t-1}}}x_0}{1-\bar{\alpha_{t-1}}}$ ....(15)

根据公式（14）（15）得到公式（16）

$u=\frac{u}{\sigma^2}/(\frac{1}{\sigma^2})$

$=\frac{\sqrt{\alpha_t}x_t(1-\bar{\alpha_{t-1}})+\sqrt{\bar{\alpha_{t-1}}}x_0\beta_t}{\alpha_t(1-\bar{\alpha_{t-1}})+\beta_t}$ ....(16)

根据 $\alpha_t=1-\beta_t$ , $a_t*\bar{\alpha_{t-1}}=\bar{\alpha_t}$

$\alpha_t(1-\bar{\alpha_{t-1}})+\beta_t=\alpha_t-\alpha_t\bar{\alpha_{t-1}}+\beta_t=1-\bar{\alpha_t}$

则 $u_{t-1}=\frac{\sqrt{\alpha_t}x_t(1-\bar{\alpha_{t-1}})+\sqrt{\bar{\alpha_{t-1}}}x_0\beta_t}{1-\bar{\alpha_t}}...(17)$

$=\frac{\sqrt{\alpha_t}(1-\bar{\alpha_{t-1}})}{1-\bar{\alpha_t}}x_t+\frac{\sqrt{\bar{\alpha_{t-1}}}\beta_t}{1-\bar{\alpha_t}}x_0...(18)$

利用

得到 $x_0 = \frac{x_t-\sqrt{1-\bar{\alpha_t}}\epsilon_0}{\sqrt{\bar{\alpha_t}}}...(19)$

将（19)带入（18）

三代码

论文代码地址： https://github.com/hojonathanho/diffusion

3.1 总体架构

整个项目包括4个文件，在附件的目录里面

3.2 Diffusion.py

主要利用前言红色部分的三个公式，实现了

Forward Diffusion Process

Reverse diffusion process

3.3 Utility.py 公用模块部分

3.3.1 时间编码，采用了Transformer里面的正余弦编码器，

3.3.2 SelfAttentionBlock

3.3.3 DownSample

3.3.4 Upsample

3.4 Unet

采用编码器解码器架构，主要包括三部分

3.1 DownConv

3.2 MibBlock

3.2

Training Process

In each batch of the training process, the following steps are taken:

Sampling a random timestep t for each training sample within the batch (e.g. images)
Adding Gaussian noise by using the closed-form formula, according to their timesteps t
Converting the timesteps into embeddings for feeding the U-Net or similar models (or other family of models)
Using the images with noise and time embeddings as input for predicting the noise present in the images
Comparing the predicted noise to the actual noise for calculating the loss function
Updating the Diffusion Model parameters via backpropagation using the loss function

This process repeats at each epoch, using the same images. However, different timesteps are usually sampled for each image at different epochs. This enables the model to learn reversing the diffusion process at any timestep, enhancing its adaptability

参考：

VAE 原理：

1. 变分自编码器（Variational Autoencoder） — 张振虎的博客张振虎文档DDPM from scratch in Pytorch | Kaggle

查看全文

http://www.kler.cn/a/562189.html