当前位置：首页 > article >正文

Gan论文阅读笔记

article 2024/11/13 19:16:07

GAN论文阅读笔记

2014年老论文了，主要记录一些重要的东西。论文链接如下：

Generative Adversarial Nets (neurips.cc)

文章目录

GAN论文阅读笔记
- 出发点
- 创新点
- 设计
- 训练代码
- 网络结构代码
- 测试代码

出发点

Deep generative models have had less of an impact, due to the difficulty of approximating many intractable probabilistic computations that arise in maximum likelihood estimation and related strategies, and due to difficulty of leveraging the benefits of piecewise linear units in the generative context.

当时的生成模型效果不佳在于近似许多棘手的概率计算十分困难，如最大似然估计等。除此之外，把利用分段线性单元运用到生成场景中也有困难。于是作者提出新的生成模型：GAN。

我的理解是，当时的生成模型都是去学习模型生成数据的分布，比如确定方差，确定均值之类的参数，然而这种方法十分难以学习，而且计算量大而复杂，作者考虑到这一点，对生成模型采用端到端的学习策略，不去学习生成数据的分布，而是直接学习模型，只要这个模型的生成结果能够逼近Ground-Truth，那么就可以直接用这个模型代替分布去生成数据。这是典型的黑箱思想。

创新点

adiscriminative model that learns to determine whether a sample is from the model distribution or the data distribution. The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency. Competition in this game drives both teams to improve their methods until the counterfeits are indistiguishable from the genuine articles.

创新点1：提出对抗学习策略：提出两个model之间相互对抗，相互抑制的策略。一个model名为生成器Generator，一个model名为判别器Discriminator，生成器尽可能生成接近真实的数据，判别器尽可能识别出生成器数据是Fake。

In this article, we explore the special case when the generative model generates samples by passing random noise through a multilayer perceptron, and the discriminative model is also a multilayer perceptron.

创新点2：当两个model都使用神经网络时，可以运用反向传播和Dropout等算法进行学习，这样就可以避免使用马尔科夫链。

设计

To learn the generator’s distribution pgover data x, we define a prior on input noise variables pz(z), then represent a mapping to data space as G(z; θg), where G is a differentiable function represented by a multilayer perceptron with parameters θg. We also define a second multilayer perceptron D(x; θd) that outputs a single scalar. D(x) represents the probability that x came from the data rather than pg.

1.输入：为了让生成器G生成的数据分布pg与真实数据分布x接近，策略是给G输入一个噪音变量z，然后学习参数θg，这个θg是G网络权重。因此，G可以被写作：G(z;θg)。
$\underset{G}{min}\underset{D}{max}V(D, G) =\mathbb{E}_{x \sim p_{data}(x)}\left[ logD(x)\right] + \mathbb{E}_{z \sim p_z(z)}\left[log(1 - D(G(z)))\right]$
2.对抗性损失函数：从代码可知，对抗性损失是两个BCELoss的和，V尽可能使D（x）更大，在此基础上尽可能使G（z）更小。这是有先后顺序的，在后面会做说明。

在代码中可知，先人为生成两个标签，第一个标签是用torch.ones生成的全为1的矩阵，形状为（batch，1）。其中batch是输入噪声的batch，第二维度只是一个数字——1，这个标签用于判别器D的BCELoss中，代入BCELoss即可得到上面对抗性损失中左侧的期望。第二个标签是用torch.zeors生成的全为0的矩阵，形状同理为（batch，1），运用于生成器G的BCELoss中，代入即可得到对抗性损失的右侧期望。

we alternate between k steps of optimizing D and one step of optimizing G.

This results in D being maintained near its optimal solution, so long as G changes slowly enough.

3.D与G的训练有先后顺序：判别器D先于生成器G训练，而且要求先对D训练k步，再为G训练1步，这就保证G的训练比D足够慢。

如果生成器G足够强大，那么判别器无法再监测生成器，也就没有对抗的必要了。相反，如果判别器D太过于强大，那么生成器也训练地十分缓慢。

在这里插入图片描述

4.算法图如上。

训练代码

import torch
import torch.nn as nn
from torchvision import datasets
from torchvision import transforms
from torchvision.utils import save_image
from torch.utils.data import DataLoader
from Model import generator
from Model import discriminator

import os

if not os.path.exists('gan_train.py'):  # 报错中间结果
    os.mkdir('gan_train.py')


def to_img(x):  # 将结果的-0.5~0.5变为0~1保存图片
    out = 0.5 * (x + 1)
    out = out.clamp(0, 1)
    out = out.view(-1, 1, 28, 28)
    return out


batch_size = 96
num_epoch = 200
z_dimension = 100


# 数据预处理
img_transform = transforms.Compose([
    transforms.ToTensor(),  # 图像数据转换成了张量，并且归一化到了[0,1]。
    transforms.Normalize([0.5], [0.5])  # 这一句的实际结果是将[0，1]的张量归一化到[-1, 1]上。前面的（0.5）均值， 后面(0.5)标准差，
])
# MNIST数据集
mnist = datasets.MNIST(
    root='./data', train=True, transform=img_transform, download=True)
# 数据集加载器
dataloader = torch.utils.data.DataLoader(
    dataset=mnist, batch_size=batch_size, shuffle=True)

D = discriminator()  # 创建生成器
G = generator()  # 创建判别器
if torch.cuda.is_available():  # 放入GPU
    D = D.cuda()
    G = G.cuda()

criterion = nn.BCELoss()  # BCELoss 因为可以当成是一个分类任务，如果后面不加Sigmod就用BCEWithLogitsLoss
d_optimizer = torch.optim.Adam(D.parameters(), lr=0.0003)  # 优化器
g_optimizer = torch.optim.Adam(G.parameters(), lr=0.0003)  # 优化器

# 开始训练
for epoch in range(num_epoch):
    for i, (img, _) in enumerate(dataloader):  # img[96,1,28,28]
        G.train()
        num_img = img.size(0)  # num_img=batchsize
        # =================train discriminator
        img = img.view(num_img, -1)  # 把图片拉平,为了输入判别器 [96,784]
        real_img = img.cuda()  # 装进cuda，真实图片

        real_label = torch.ones(num_img).reshape(num_img, 1).cuda()  # 希望判别器对real_img输出为1 [96,1]
        fake_label = torch.zeros(num_img).reshape(num_img, 1).cuda()  # 希望判别器对fake_img输出为0  [96,1]

        # 先训练鉴别器
        # 计算真实图片的loss
        real_out = D(real_img)  # 将真实图片输入鉴别器 [96,1]
        d_loss_real = criterion(real_out, real_label)  # 希望real_out越接近1越好 [1]
        real_scores = real_out  # 后面print用的

        # 计算生成图片的loss
        z = torch.randn(num_img, z_dimension).cuda()  # 创建一个100维度的随机噪声作为生成器的输入 [96,1]
        #   这个z维度和生成器第一个Linear第一个参数一致
        # 避免计算G的梯度
        fake_img = G(z).detach()  # 生成伪造图片 [96,748]
        fake_out = D(fake_img)  # 给判别器判断生成的好不好 [96,1]

        d_loss_fake = criterion(fake_out, fake_label)  # 希望判别器给fake_out越接近0越好 [1]
        fake_scores = fake_out  # 后面print用的

        d_loss = d_loss_real + d_loss_fake

        d_optimizer.zero_grad()
        d_loss.backward()
        d_optimizer.step()

        # 训练生成器
        # 计算生成图片的loss
        z = torch.randn(num_img, z_dimension).cuda()  # 生成随机噪声 [96,100]

        fake_img = G(z)  # 生成器伪造图像 [96,784]
        output = D(fake_img)  # 将伪造图像给判别器判断真伪 [96,1]
        g_loss = criterion(output, real_label)  # 生成器希望判别器给的值越接近1越好 [1]

        # 更新生成器
        g_optimizer.zero_grad()
        g_loss.backward()
        g_optimizer.step()

        if (i + 1) % 100 == 0:
            print(
                f'Epoch [{epoch}/{num_epoch}], d_loss: {d_loss.cpu().detach():.6f}, g_loss: {g_loss.cpu().detach():.6f}',
                f'D real: {real_scores.cpu().detach().mean():.6f}, D fake: {fake_scores.cpu().detach().mean():.6f}')
    if epoch == 0:  # 保存图片
        real_images = to_img(real_img.detach().cpu())
        save_image(real_images, './img_gan/real_images.png')

    fake_images = to_img(fake_img.detach().cpu())
    save_image(fake_images, f'./img_gan/fake_images-{epoch + 1}.png')

    G.eval()
    with torch.no_grad():
        new_z = torch.randn(batch_size, 100).cuda()
        test_img = G(new_z)
        print(test_img.shape)
        test_img = to_img(test_img.detach().cpu())
        test_path = f'./test_result/the_{epoch}.png'
        save_image(test_img, test_path)

# 保存模型
torch.save(G.state_dict(), './generator.pth')
torch.save(D.state_dict(), './discriminator.pth')

网络结构代码

import torch
from torch import nn


# 判别器 判别图片是不是来自MNIST数据集
class discriminator(nn.Module):
    def __init__(self):
        super(discriminator, self).__init__()
        self.dis = nn.Sequential(
            nn.Linear(784, 256),  # 784=28*28
            nn.LeakyReLU(0.2),
            nn.Linear(256, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid()
            #   sigmoid输出这个生成器是或不是原图片，是二分类
        )

    def forward(self, x):
        x = self.dis(x)
        return x


# 生成器 生成伪造的MNIST数据集
class generator(nn.Module):
    def __init__(self):
        super(generator, self).__init__()
        self.gen = nn.Sequential(
            nn.Linear(100, 256),  # 输入为100维的随机噪声
            nn.ReLU(),
            nn.Linear(256, 256),
            nn.ReLU(),
            nn.Linear(256, 784),
            #   生成器输出的特征维和正常图片一样，这是一个可参考的点
            nn.Tanh()
        )

    def forward(self, x):
        x = self.gen(x)
        return x


class FinetuneModel(nn.Module):
    def __init__(self, weights):
        super(FinetuneModel, self).__init__()
        self.G = generator()
        base_weights = torch.load(weights)

        model_parameters = dict(self.G.named_parameters())
        #   不是对model进行named_parameters，而是对model里面的具体网络进行named_parameters取出参数，否则取出的是model冗余的参数去测试
        pretrained_weights = {k: v for k, v in base_weights.items() if k in model_parameters}

        new_state_dict = {k: pretrained_weights[k] for k in model_parameters.keys()}
        self.G.load_state_dict(new_state_dict)

    def forward(self, input):
        output = self.G(input)
        return output

测试代码

import os
import sys
import numpy as np
import torch
import argparse
import torch.utils.data
from PIL import Image
from Model import FinetuneModel
from Model import generator
from torchvision.utils import save_image

parser = argparse.ArgumentParser("GAN")
parser.add_argument('--save_path', type=str, default='./test_result')
parser.add_argument('--gpu', type=int, default=0)
parser.add_argument('--seed', type=int, default=2)
parser.add_argument('--model', type=str, default='generator.pth')

args = parser.parse_args()
save_path = args.save_path
os.makedirs(save_path, exist_ok=True)


def to_img(x):  # 将结果的-0.5~0.5变为0~1保存图片
    out = 0.5 * (x + 1)
    out = out.clamp(0, 1)
    out = out.view(-1, 1, 28, 28)
    return out


def main():
    if not torch.cuda.is_available():
        print("no gpu device available")
        sys.exit(1)

    model = FinetuneModel(args.model)
    model = model.to(device=args.gpu)
    model.eval()

    z_dimension = 100

    with torch.no_grad():
        for i in range(100):
            z = torch.randn(96, z_dimension).cuda()  # 创建一个100维度的随机噪声作为生成器的输入 [96,100]
            output = model(z)
            print(output.shape)
            u_name = f'the_{i}.png'
            print(f'processing {u_name}')
            u_path = save_path + '/' + u_name
            output = to_img(output.cpu().detach())
            save_image(output, u_path)


if __name__ == '__main__':
    main()