Noise Conditional Score Network
NCSN
p
σ
(
x
~
∣
x
)
:
=
N
(
x
~
;
x
,
σ
2
I
)
p_\sigma(\tilde{\mathrm{x}}|\mathrm{x}) := \mathcal{N}(\tilde{\mathrm{x}}; \mathrm{x}, \sigma^2\mathbf{I})
pσ(x~∣x):=N(x~;x,σ2I)
p
σ
(
x
~
)
:
=
∫
p
d
a
t
a
(
x
)
p
σ
(
x
~
∣
x
)
d
x
p_\sigma(\mathrm{\tilde{x}}) := \int p_{data}(\mathrm{x})p_\sigma(\mathrm{\tilde{x}}|\mathrm{x})d\mathrm{x}
pσ(x~):=∫pdata(x)pσ(x~∣x)dx
p
d
a
t
a
(
x
)
p_{data}(x)
pdata(x)表示目标数据分布。
σ
m
i
n
=
σ
1
<
σ
2
<
⋅
⋅
⋅
<
σ
N
=
σ
m
a
x
\sigma_{\mathrm{min}}=\sigma_1<\sigma_2<\cdot\cdot\cdot<\sigma_N=\sigma_{\mathrm{max}}
σmin=σ1<σ2<⋅⋅⋅<σN=σmax
σ
m
i
n
\sigma_{\mathrm{min}}
σmin足够小,以至于
p
σ
m
i
n
(
x
)
≈
p
d
a
t
a
(
x
)
)
p_{\sigma_{\mathrm{min}}}(\mathrm{x}) \approx p_{data}(\mathrm{x}))
pσmin(x)≈pdata(x)),
σ
m
a
x
\sigma_{\mathrm{max}}
σmax足够大,以至于
p
σ
m
i
n
(
x
)
)
≈
N
(
x
;
0
,
σ
m
a
x
2
I
)
p_{\sigma_{\mathrm{min}}}(\mathrm{x})) \approx \mathcal{N}(\mathbf{x}; \mathbf{0}, \sigma^2_{\mathrm{max}}\mathbf{I})
pσmin(x))≈N(x;0,σmax2I)
θ ∗ = arg min θ ∑ i = 1 N σ i 2 E x ∼ p d a t a ( x ) E x ~ ∼ p σ i ( x ~ ∣ x ) [ ∣ ∣ s θ ( x ~ , σ i ) − ▽ x ~ l o g p σ ~ ( x ~ ∣ x ) ∣ ∣ 2 2 ] \theta^{*} = \argmin_\theta \sum_{i=1}^N \sigma_i^2 \mathbb{E}_{\mathrm{x}\sim p_{data}(\mathrm{x})}\mathbb{E}_{\tilde{\mathrm{x}}\sim p_{\sigma_i}(\tilde{\mathrm{x}}|\mathrm{x})} \Big[ ||s_\theta(\tilde{\mathrm{x}}, \sigma_i) - \mathbf{\triangledown}_{\tilde{\mathrm{x}}}\mathrm{log}p_{\tilde{\sigma}}(\tilde{\mathrm{x}}|\mathrm{x})||^2_2\Big] θ∗=θargmini=1∑Nσi2Ex∼pdata(x)Ex~∼pσi(x~∣x)[∣∣sθ(x~,σi)−▽x~logpσ~(x~∣x)∣∣22]
模型训练完毕后,执行M步Langevin MCMC采样:
x
i
m
=
x
i
m
−
1
+
ϵ
i
s
θ
∗
(
x
i
m
−
1
,
σ
i
)
+
2
ϵ
i
z
i
m
,
m
=
1
,
2
,
⋅
⋅
⋅
,
M
x_i^m = x_i^{m-1} + \epsilon_i s_{\theta^{*}}(x_i^{m-1}, \sigma_i) + \sqrt{2\epsilon_i}z_i^m, \quad m=1,2,\cdot\cdot\cdot, M
xim=xim−1+ϵisθ∗(xim−1,σi)+2ϵizim,m=1,2,⋅⋅⋅,M
ϵ
i
>
0
\epsilon_i>0
ϵi>0为步长,
z
i
m
z_i^m
zim是标准正态分布。上述采样过程重复
i
=
N
,
N
−
1
,
⋅
⋅
⋅
,
1
i=N, N-1, \cdot\cdot\cdot, 1
i=N,N−1,⋅⋅⋅,1,也就是说对于每个noise level,执行N步,直至样本收敛至当前noise level的最佳位置。
x
N
0
∼
N
(
x
∣
0
,
σ
m
a
x
2
I
)
,
x
i
0
=
x
i
+
1
M
w
h
e
n
i
<
N
x_N^0 \sim \mathcal{N}(\mathrm{x}|0, \sigma^2_{max}\mathbf{I}), \ x_i^0 = x_{i+1}^M \mathrm{when}\ i < N
xN0∼N(x∣0,σmax2I), xi0=xi+1Mwhen i<N
样例代码如下:
import torch
def langevin_sampling(score_network, noise_levels, num_steps, step_size, batch_size, device):
# 初始化样本,从标准正态分布中采样
x = torch.randn(batch_size, 3, 32, 32).to(device)
# 多级噪声采样,从高噪声到低噪声逐步采样,最终得到逼近目标分布的样本
for sigma in noise_levels:
print(f"Sampling at noise level: {sigma}")
# Langevin动力学迭代
for _ in range(num_steps):
# 计算分布函数梯度(由分数网络预测)
with torch.no_grad():
grad = score_network(x, sigma) # 输入当前样本和噪声水平
# 梯度上升步
x = x + step_size * grad
# 添加随机噪声步
noise = torch.randn_like(x) * (2 * step_size) ** 0.5
x = x + noise
return x
class DummyScoreNet(nn.Module):
def forward(self, x, sigma):
return -x / (sigma ** 2)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
score_network = DummyScoreNet().to(device)
noise_levels = [50.0, 25.0, 10.0, 5.0, 1.0]
num_steps = 50
step_size = 0.1
batch_size = 64
samples = langevin_sampling(score_network,
noise_levels,
num_steps,
step_size,
batch_size,
device)