当前位置: 首页 > article >正文

CycleGAN-VC模型及代码实现

CycleGAN-VC代码实战

之前的博客讲解了CycleGAN-VC模型的原理,那具体代码怎么实现呢?今天学习第一代的CycleGAN-VC。
基于Tensorflow深度学习框架。

生成器

生成器结构:
在这里插入图片描述
生成器代码:

import tensorflow as tf 

def generator_gatedcnn(inputs, reuse = False, scope_name = 'generator_gatedcnn'):

    # inputs has shape [batch_size, num_features, time]
    # we need to convert it to [batch_size, time, num_features] for 1D convolution
    inputs = tf.transpose(inputs, perm = [0, 2, 1], name = 'input_transpose')

    with tf.variable_scope(scope_name) as scope:
        # Discriminator would be reused in CycleGAN
        if reuse:
            scope.reuse_variables()
        else:
            assert scope.reuse is False

        h1 = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv')
        h1_gates = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv_gates')
        h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu')

        # Downsample
        d1 = downsample1d_block(inputs = h1_glu, filters = 256, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block1_')
        d2 = downsample1d_block(inputs = d1, filters = 512, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block2_')

        # Residual blocks
        r1 = residual1d_block(inputs = d2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block1_')
        r2 = residual1d_block(inputs = r1, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block2_')
        r3 = residual1d_block(inputs = r2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block3_')
        r4 = residual1d_block(inputs = r3, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block4_')
        r5 = residual1d_block(inputs = r4, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block5_')
        r6 = residual1d_block(inputs = r5, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block6_')

        # Upsample
        u1 = upsample1d_block(inputs = r6, filters = 1024, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block1_')
        u2 = upsample1d_block(inputs = u1, filters = 512, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block2_')

        # Output
        o1 = conv1d_layer(inputs = u2, filters = 24, kernel_size = 15, strides = 1, activation = None, name = 'o1_conv')
        o2 = tf.transpose(o1, perm = [0, 2, 1], name = 'output_transpose')

    return o2

其中的每个模块:

一维卷积 conv1d_layer
def conv1d_layer(
    inputs, 
    filters, 
    kernel_size, 
    strides = 1, 
    padding = 'same', 
    activation = None,
    kernel_initializer = None,
    name = None):

    conv_layer = tf.layers.conv1d(
        inputs = inputs,
        filters = filters,
        kernel_size = kernel_size,
        strides = strides,
        padding = padding,
        activation = activation,
        kernel_initializer = kernel_initializer,
        name = name)

    return conv_layer
GLU模块gated_linear_layer

可以看到h1_glu需要h1和h1_gates作为输入,

def gated_linear_layer(inputs, gates, name = None):

    activation = tf.multiply(x = inputs, y = tf.sigmoid(gates), name = name)

    return activation

———————————————————————————————————————————————————————
注:

tf.multiply是一个执行元素级乘法的操作。它将两个张量(tensors)中的对应元素相乘,并返回一个新的张量,其形状通常是两个输入张量形状的广播(broadcasted)结果。
下面是tf.multiply函数的基本用法:

import tensorflow as tf

# 创建两个简单的张量
a = tf.constant([1, 2, 3])
b = tf.constant([4, 5, 6])

# 执行元素级乘法
product = tf.multiply(a, b)

# 启动一个TensorFlow会话并计算乘积
with tf.Session() as sess:
    result = sess.run(product)
    print(result)  # 输出: [ 4 10 18]

在这个例子中,a和b是两个一维张量,它们的形状都是[3]。tf.multiply(a, b)将返回一个新的张量,其中包含a和b中对应元素相乘的结果。
tf.multiply还支持广播机制,这意味着你可以将具有不同形状的张量相乘,只要它们的形状在每个维度上是兼容的。例如:

# 创建形状不同的张量
a = tf.constant([[1, 2], [3, 4]])
b = tf.constant([5, 6])

# 执行元素级乘法,b会自动广播以匹配a的形状
product = tf.multiply(a, b)

# 启动一个TensorFlow会话并计算乘积
with tf.Session() as sess:
    result = sess.run(product)
    print(result)  # 输出: [[ 5 12], [15 24]]

在这个例子中,b的形状是[2],而a的形状是[2, 2]。由于b的形状可以在第一个维度上广播到匹配a的形状,所以b会在第一个维度上复制以匹配a的形状,然后执行元素级乘法。

在TensorFlow 2.x中,通常推荐使用tf.multiply的简写形式*运算符来执行元素级乘法,因为TensorFlow 2.x默认启用了Eager Execution,这使得操作更加直观和易于使用,也更符合Python的语法习惯。

———————————————————————————————————————————————————————

一维下采样模块downsample1d_block
def downsample1d_block(
    inputs, 
    filters, 
    kernel_size, 
    strides,
    name_prefix = 'downsample1d_block_'):

    h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
    h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm')
    h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
    h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
    h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')

    return h1_glu

其中用到了instance_norm_layer:

def instance_norm_layer(
    inputs, 
    epsilon = 1e-06, 
    activation_fn = None, 
    name = None):

    instance_norm_layer = tf.contrib.layers.instance_norm(
        inputs = inputs,
        epsilon = epsilon,
        activation_fn = activation_fn)

    return instance_norm_layer

tf.contrib.layers.instance_norm来实现实例归一化(Instance Normalization)。实例归一化是一种深度学习技术,用于归一化输入数据的每个实例,通常用于风格迁移和图像增强任务中。
需要注意的是,tf.contrib模块在TensorFlow 2.x版本中已经被移除,因为TensorFlow团队决定不再维护这个模块。如果您正在使用TensorFlow 2.x,您需要使用其他方式来实现实例归一化。

以下是使用TensorFlow 2.x中的核心API实现实例归一化的方法:

import tensorflow as tf

def instance_norm_layer(inputs, epsilon=1e-6, activation_fn=None, name=None):
    with tf.name_scope(name):
        # 计算实例的均值和方差
        mean, variance = tf.nn.moments(inputs, axes=[1, 2], keepdims=True)
        
        # 归一化
        normalized = (inputs - mean) / tf.sqrt(variance + epsilon)
        
        # 可学习参数,用于缩放和平移归一化后的输出
        scale = tf.Variable(tf.ones(inputs.shape[-1]), name='scale')
        shift = tf.Variable(tf.zeros(inputs.shape[-1]), name='shift')
        
        # 应用缩放和平移
        normalized = scale * normalized + shift
        
        # 应用激活函数
        if activation_fn is not None:
            normalized = activation_fn(normalized)
        
        return normalized
残差块residual1d_block
def residual1d_block(
    inputs, 
    filters = 1024, 
    kernel_size = 3, 
    strides = 1,
    name_prefix = 'residule_block_'):

    h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
    h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm')
    h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
    h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
    h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
    h2 = conv1d_layer(inputs = h1_glu, filters = filters // 2, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h2_conv')
    h2_norm = instance_norm_layer(inputs = h2, activation_fn = None, name = name_prefix + 'h2_norm')
    
    h3 = inputs + h2_norm 

    return h3
上采样upsample1d_block
def upsample1d_block(
    inputs, 
    filters, 
    kernel_size, 
    strides,
    shuffle_size = 2,
    name_prefix = 'upsample1d_block_'):
    
    h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
    h1_shuffle = pixel_shuffler(inputs = h1, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle')
    h1_norm = instance_norm_layer(inputs = h1_shuffle, activation_fn = None, name = name_prefix + 'h1_norm')

    h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
    h1_shuffle_gates = pixel_shuffler(inputs = h1_gates, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle_gates')
    h1_norm_gates = instance_norm_layer(inputs = h1_shuffle_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')

    h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')

    return h1_glu

这个模块里和下采样模块相比,多了pixel_shuffler模块:

def pixel_shuffler(inputs, shuffle_size = 2, name = None):

    n = tf.shape(inputs)[0]
    w = tf.shape(inputs)[1]
    c = inputs.get_shape().as_list()[2]

    oc = c // shuffle_size
    ow = w * shuffle_size

    outputs = tf.reshape(tensor = inputs, shape = [n, ow, oc], name = name)

    return outputs

Pixel Shuffle操作通常用于图像超分辨率任务中,用于将低分辨率图像上采样到高分辨率图像。
首先,让我们回顾一下Pixel Shuffle的基本原理。给定一个输入张量inputs,其形状为(batch_size, width, channels * (shuffle_size ^ 2)),Pixel Shuffle操作将其重新排列为(batch_size, width * shuffle_size, channels)。这里的shuffle_size是上采样因子,用于指定在宽度和高度上将图像放大的倍数。
———————————————————————————————————————————
注:
在TensorFlow 1.x中,inputs.get_shape().as_list()用于获取张量的形状,但在TensorFlow 2.x中,这个用法已经被弃用。在TensorFlow 2.x中,可以直接使用inputs.shape来获取张量的形状,这是一个更直接和推荐的方式。
网上查看说我的pixel_shuffler函数中的逻辑似乎有误?(再学习学习)
在Pixel Shuffle操作中,你需要将输入张量的最后一个维度(通道维度)重新排列,以便将每个shuffle_size x shuffle_size的通道块重新排列到输出的空间维度上。这意味着你需要在最后一个维度上进行重塑和转置操作。
以下是一个修正后的pixel_shuffler函数,适用于TensorFlow 2.x:

import tensorflow as tf

def pixel_shuffler(inputs, shuffle_size=2, name=None):
    with tf.name_scope(name):
        # 获取输入张量的批次大小、高度、宽度和通道数
        n, h, w, c = inputs.shape

        # 确保通道数是shuffle_size * shuffle_size的倍数
        assert c % (shuffle_size * shuffle_size) == 0, "The number of channels must be divisible by shuffle_size^2"

        # 计算输出的通道数和宽度
        oc = c // (shuffle_size * shuffle_size)
        ow = w * shuffle_size

        # 重塑输入张量,准备进行Pixel Shuffle操作
        inputs = tf.reshape(inputs, [n, h, w, shuffle_size, shuffle_size, oc])

        # 转置和重塑以完成Pixel Shuffle操作
        outputs = tf.reshape(tf.transpose(inputs, [0, 1, 3, 4, 2, 5]), [n, h, ow, oc])

    return outputs

在这个修正后的函数中,我们首先确保输入张量的通道数是shuffle_size * shuffle_size的倍数。然后,我们计算输出的通道数和宽度。接下来,我们使用tf.reshape和tf.transpose操作来重新排列输入张量的维度,以完成Pixel Shuffle操作。最后,我们返回重塑后的输出张量。

请注意,这个函数假设inputs是一个四维张量,其形状为(batch_size, height, width, channels)。如果你的输入张量有不同的形状,你可能需要调整代码以适应不同的形状。
———————————————————————————————————————————

判别器

判别器结构:
在这里插入图片描述

判别器代码:

def discriminator(inputs, reuse = False, scope_name = 'discriminator'):

    # inputs has shape [batch_size, num_features, time]
    # we need to add channel for 2D convolution [batch_size, num_features, time, 1]
    inputs = tf.expand_dims(inputs, -1)

    with tf.variable_scope(scope_name) as scope:
        # Discriminator would be reused in CycleGAN
        if reuse:
            scope.reuse_variables()
        else:
            assert scope.reuse is False

        h1 = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv')
        h1_gates = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv_gates')
        h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu')

        # Downsample
        d1 = downsample2d_block(inputs = h1_glu, filters = 256, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block1_')
        d2 = downsample2d_block(inputs = d1, filters = 512, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block2_')
        d3 = downsample2d_block(inputs = d2, filters = 1024, kernel_size = [6, 3], strides = [1, 2], name_prefix = 'downsample2d_block3_')

        # Output
        o1 = tf.layers.dense(inputs = d3, units = 1, activation = tf.nn.sigmoid)

        return o1


http://www.kler.cn/a/318999.html

相关文章:

  • Spring Boot 2.x 和 Druid 多数据源整合 dm
  • leetcode hot100【LeetCode 114.二叉树展开为链表】java实现
  • 【深圳大学】数据结构A+攻略(计软版)
  • react动态路由
  • AI绘画经验(stable-diffusion)
  • 【juc】AbstractQueuedSynchronized为什么采用双向链表
  • Git(5):分支合并merge和rebase
  • Linux突发网络故障常用排查的命令
  • andriod studio 学习1
  • python中网络爬虫框架
  • 【垃圾识别系统】Python+卷积神经网络算法+人工智能+深度学习+计算机毕设项目选题+TensorFlow+图像识别
  • Java Web应用升级故障案例解析
  • 云栖实录 | 阿里云 OpenLake 解决方案重磅发布:多模态数据统一纳管、引擎平权联合计算、数据共享统一读写
  • 使用 UWA Gears 测试小游戏性能
  • Python GUI 编程:tkinter 初学者入门指南——标签
  • # 利刃出鞘_Tomcat 核心原理解析(十一)-- Tomcat 附加功能 WebSocket -- 3
  • Elasticsearch 应用实战:从基础到高级实践
  • 网口环保212设备数据 转 profinet IO项目案例
  • 利用人工智能改变视频智能
  • 《AI办公类工具表格处理系列之三——GPT-Excel》
  • 第十四届蓝桥杯嵌入式国赛
  • 【JVM】双亲委派模型
  • JDk17新特性
  • html TAB、table生成
  • netfilter和iptables--netfilter源码篇
  • 说说明计算机视觉(CV)技术的优势和挑战。