tensorflow一维卷积输入_TensorFlow 从零开始实现深度卷积生成对抗网络(DCGAN)

本文将使用 TensorFlow 实现深度卷积生成对抗网络(DCGAN),并用其训练生成一些小姐姐的图像。其中,训练图像来源为:用DCGAN生成女朋友,图像全部由小姐姐的头像组成,大概如下:

图1 用于训练 DCGAN 的小姐姐头像

生成对抗网络是近几年深度学习中一个比较热门的研究方向,不断的提出了各种各样的变体,包括 GAN、DCGAN、InfoGAN、WGAN、CycleGAN 等。这篇文章在参考 GAN 和 DCGAN 这两篇论文,以及 TensorFlow GAN 部分源代码的基础上,简单的实现了 DCGAN,并做了相当多的实验,生成了一些比较逼真的图像。

其实,在 GitHub 上已经有 DCGAN 的很多项目,星星比较的多的是 DCGAN-tensorflow,但我粗略阅读了他的代码后,觉得可读性不太好,因此还是觉得应该自己从头实现一遍,加深对对抗网络的理解。深度卷积生成对抗网络的网络结构比较简单,很容易实现,真正困难的是调参,参数稍微调整不好,就很容易使训练奔溃,生成的图像完全是噪声图像。

一、DCGAN 网络的定义

根据生成对抗网络(GAN)的发明者 Goodfellow 的说法,生成对抗网络由生成器(generator)G 和判别器(discriminator)D 两部分组成,其中生成器像假币制造者,企图制造出以假乱真的钱币,而判别器则像验钞机,能识别出哪些是真币哪些是假币。这种造假、打假的矛盾就产生了对抗,当生成器和判别器的能力都充分强大时,对抗的结局是趋于平衡,即生成器生成的样本判别器已经无法区分真伪,判别任何一个样本为真的概率都是 0.5。这当然是理想情况,实际对抗时,很难达到这样的平衡,只能达到一种比较脆弱的、动态的平衡,即生成器能够生成一些足够逼真的样本,而判别器也已很难鉴别样本的真假,但只要参数的变化幅度稍微较大时,就可能打破这个平衡,使得生成器瞬间脆败,生成的样本噪声越来越大,而判别器则不断占上风,能够轻而易举的识别真假,从而使得识别损失快速下降到 0。但既然是对抗,生成器就有可能触底反弹,再次东山再起,重新掀起一阵造假风波,使得判别器又陷入难辨真伪的窘迫境地。一般来说,生成对抗网络的训练过程就是达到平衡、平衡被破坏的、又达到平衡、又被破坏的循环过程,因此它的损失曲线是一条像过山车似的波浪线。

一般我们接触得比较多的深度学习模型大致有两类,一类是判别模型,一类是生成模型。判别模型的训练数据带有标签,比如分类,给定了一个样本之后需要确定它的归属;而生成模型则是需要根据训练数据来生成样本,或者确定训练数据的分布。通常,生成模型的问题更难,因为分布的归一化系数,即配分函数,很难处理。

GAN 的作者创造性的将判别模型和生成模型结合在一起,极大的简化了生成模型的求解过程,不过,缺点是训练不稳定。以下,以生成具有某种特性的图像为例,比如以生成小姐姐的头像为例,来简单的阐述深度卷积生成对抗网络(DCGAN)的原理和实现过程。

假如我们现在有很多小姐姐的头像,我们的目标是要设计一个网络,让它可以生成很逼真的小姐姐的图像。一个很自然的问题是:网络的输入是什么?Goodfellow 的想法很简单,输入是一个随机分布(比如正态分布、均匀分布等)的样本,一般是从这个分布中随机采样一个固定长度的向量,比如长度为 100 或 64 等。对于我们生成小姐姐头像的目标,我们需要从这个一维向量构造出一个具有 3 个颜色通道的 3 维图像。这需要借助一种称为转置卷积(transpose convolution 或 deconvolution)的技术。回想一下卷积网络的整个结构:从一幅 3 个颜色通道的图像开始,经过卷积、池化等作用之后,得到一个一维的最终输出。这显然可以看成是从一个一维向量生成 3 维图像过程的逆过程,因此也把转置卷积称为反卷积或解卷积。如下图:

图2 DCGAN 的生成器网络结构

设随机采样的样本为 [x1, ..., xn](n=64 或 100 等),为了输入到一个(转置)卷积网络,将样本数据扩充为一个形状为 1 x 1 x 1 x n 的四维张量,经过第一个(转置)卷积层(卷积核大小 kernel_size = 4,步幅 stride = 2,填充方式 padding = 'VALID')之后,得到形状为 1 x 4 x 4 x 1024 的张量(跟卷积的运算相反,空间大小变大),再经过第二个(转置)卷积层(卷积核大小 kernel_size = 4,步幅 stride = 2,填充方式 padding = 'SAME')之后,形状变为 1 x 8 x 8 x 512,...,到第 6 个转置卷积层(卷积核大小 kernel_size = 4,步幅 stride = 2,填充方式 padding = 'SAME')之后,得到形状大小为 1 x 64 x 64 x 64 的张量,此时,为了得到一张 3 通道的图像,只需要再作用一个卷积层(卷积核大小 kernel_size = 1,步幅 stride = 1,特征映射个数 num_outputs = 3,填充方式 padding = 'SAME')即可,这样做了之后,输出张量的形状大小为 1 x 64 x 64 x 3,压缩第 0 个索引维度之后就得到一张分辨率为 64 x 64 的彩色图像。

以上即是生成器的网络结构。经过这个网络作用之后,可以把随机采样的一维向量转化成一张图像,不过不可忽略的是,这张图像也是随机的,因此可能全是噪声。为了让这些生成的图像具有小姐姐的人脸特征,需要加入一些监督信息来对生成器的参数进行训练。这部分的工作就由判别器来承担。判别器的网络结构基本上就是上述生成器的网络结构的逆结构(即几乎是上图从右往左看的结果),只不过最后的输出是一个长度为 2 的向量,即判别器是一个 2 分类器,用来识别一张图像是训练的真图像还是生成的假图像。因为判别器是深度网络,具有很强的拟合能力,因此很容易提取出训练数据的人脸特征,相当于提供了一种弱监督的信息(即提取的人脸特征)。接下来的关键问题是怎么充分的利用这种弱监督信息。

前面提到过,生产对抗网络的对抗过程是:生成器尽量生成逼真的假样本,使得判别器难辨真假,而判别器则尽量提升自己的判别能力,区分出生成器的假样本。因此,对生成器来说,生成的样本越接近训练数据越好。对于生成小姐姐图像的这个任务来说,生成器生成的样本具有越强的女性人脸特征越好。而女性人脸特征可以由判别器提供,因此得到的弱监督目标为:判别器作用于生成器生成的图像的结果,与判别器作用于真实训练图像的结果相似。换句话说,对生成器来说,它应该把自己生成的图像当成真实训练图像来看。而对判别器来说,则要把生成器生成的图像当做假图像来看,从而得到生成对抗网络的损失函数为:

更容易理解的方式是:

给定一个随机采样的向量 z,经过生成器作用之后生成一张图像 G(z),这张图像 G(z) 送给判别器 D 识别之后输出一个 2 分类概率 D(G(z)) = [p1, p2]。对于生成器 G 来说,它的目标是生成和真实训练样本 x 相差无几的图像,因此它要把所有生成的图像都看成是真实图像,因此生成器的损失是:

generator_loss = sigmoid_cross_entropy(logits=[p1, p2], labels=[1])

而对于判别器 D 来说,它希望识别能力越强越好,因此要认为这是一张假图像,从而得到判别器在生成图像上的损失:

discriminator_loss_on_generated = sigmoid_cross_entropy(logits=[p1, p2], labels=[0])

另一方面,为了利用真实图像的(弱监督)信息,判别器在所有真实训练样本 x 上的损失为:

discriminator_loss_on_real = sigmoid_cross_entropy(logits=[q1, q2], labels=[1])

其中 [q1, q2] = D(x) 是判别器作用于真实训练图像 x 后输出的识别概率。

到此,整个生成对抗网络的最重要的两部分(分别是:生成器、判别器的网络结构和它们对应的损失)内容就讲述完了。一般的,在实际实现时,上述的损失会进行一些平滑处理(见后面源代码,或论文 Improved Techniques for Training GANs),除此之外,在优化判别器时使用两部分损失之和:

discriminator_loss = discriminator_loss_on_generated + discriminator_loss_on_real

这样,我们总共得到了 4 个损失,其中用于反向传播优化网络参数的损失是:generator_loss 和 discriminator_loss。将以上思想用 TensorFlow 实现,即得到 DCGAN 的模型(命名为 model.py):

#!/usr/bin/env python3

# -*- coding: utf-8 -*-

"""

Created on Sat May 26 20:03:48 2018

@author: shirhe-lyh

"""

"""Implementation of DCGAN.

This work was first described in:

Unsupervised representation learning with deep convolutional generative

adversarial networks, Alec Radford et al., arXiv: 1511.06434v2

This module Based on:

TensorFlow models/research/slim/nets/dcgan.py

TensorFlow tensorflow/contrib/gan

"""

import math

import tensorflow as tf

slim = tf.contrib.slim

class DCGAN(object):

"""Implementation of DCGAN."""

def __init__(self,

is_training,

generator_depth=64,

discriminator_depth=64,

final_size=32,

num_outputs=3,

fused_batch_norm=False):

"""Constructor.

Args:

is_training: Whether the the network is for training or not.

generator_depth: Number of channels in last deconvolution layer of

the generator network.

discriminator_depth: Number of channels in first convolution layer

of the discirminator network.

final_size: The shape of the final output.

num_outputs: Nuber of output features. For images, this is the

number of channels.

fused_batch_norm: If 'True', use a faster, fused implementation

of batch normalization.

"""

self._is_training = is_training

self._generator_depth = generator_depth

self._discirminator_depth = discriminator_depth

self._final_size = final_size

self._num_outputs = num_outputs

self._fused_batch_norm = fused_batch_norm

def _validate_image_inputs(self, inputs):

"""Check the inputs whether is valid or not.

Copy from:

https://github.com/tensorflow/models/blob/master/research/

slim/nets/dcgan.py

Args:

inputs: A float32 tensor with shape [batch_size, height, width,

channels].

Raises:

ValueError: If the input image shape is not 4-dimensional, if the

spatial dimensions aren't defined at graph construction time,

if the spatial dimensions aren't square, or if the spatial

dimensions aren't a power of two.

"""

inputs.get_shape().assert_has_rank(4)

inputs.get_shape()[1:3].assert_is_fully_defined()

if inputs.get_shape()[1] != inputs.get_shape()[2]:

raise ValueError('Input tensor does not have equal width and '

'height: ', inputs.get_shape()[1:3])

width = inputs.get_shape().as_list()[2]

if math.log(width, 2) != int(math.log(width, 2)):

raise ValueError("Input tensor 'width' is not a power of 2: ",

width)

def discriminator(self,

inputs,

depth=64,

is_training=True,

reuse=None,

scope='Discriminator',

fused_batch_norm=False):

"""Discriminator network for DCGAN.

Construct discriminator network from inputs to the final endpoint.

Copy from:

https://github.com/tensorflow/models/blob/master/research/

slim/nets/dcgan.py

Args:

inputs: A float32 tensor with shape [batch_size, height, width,

channels].

depth: Number of channels in first convolution layer.

is_training: Whether the network is for training or not.

reuse: Whether or not the network variables should be reused.

'scope' must be given to be reused.

scope: Optional variable_scope. Default value is 'Discriminator'.

fused_batch_norm: If 'True', use a faster, fused implementation

of batch normalization.

Returns:

logits: The pre-softmax activations, a float32 tensor with shape

[batch_size, 1].

end_points: A dictionary from components of the network to their

activation.

Raises:

ValueError: If the input image shape is not 4-dimensional, if the

spatial dimensions aren't defined at graph construction time,

if the spatial dimensions aren't square, or if the spatial

dimensions aren't a power of two.

"""

normalizer_fn = slim.batch_norm

normalizer_fn_args = {

'is_training': is_training,

'zero_debias_moving_mean': True,

'fused': fused_batch_norm}

self._validate_image_inputs(inputs)

height = inputs.get_shape().as_list()[1]

end_points = {}

with tf.variable_scope(scope, values=[inputs], reuse=reuse) as scope:

with slim.arg_scope([normalizer_fn], **normalizer_fn_args):

with slim.arg_scope([slim.conv2d], stride=2, kernel_size=4,

activation_fn=tf.nn.leaky_relu):

net = inputs

for i in range(int(math.log(height, 2))):

scope = 'conv%i' % (i+1)

current_depth = depth * 2**i

normalizer_fn_ = None if i == 0 else normalizer_fn

net = slim.conv2d(net, num_outputs=current_depth,

normalizer_fn=normalizer_fn_,

scope=scope)

end_points[scope] = net

logits = slim.conv2d(net, 1, kernel_size=1, stride=1,

padding='VALID', normalizer_fn=None,

activation_fn=None)

logits = tf.reshape(logits, [-1, 1])

end_points['logits'] = logits

return logits, end_points

def generator(self,

inputs,

depth=64,

final_size=32,

num_outputs=3,

is_training=True,

reuse=None,

scope='Generator',

fused_batch_norm=False):

"""Generator network for DCGAN.

Construct generator network from inputs to the final endpoint.

Copy from:

https://github.com/tensorflow/models/blob/master/research/

slim/nets/dcgan.py

Args:

inputs: A float32 tensor with shape [batch_size, N] for any size N.

depth: Number of channels in last deconvolution layer.

final_size: The shape of the final output.

num_outputs: Nuber of output features. For images, this is the

number of channels.

is_training: Whether is training or not.

reuse: Whether or not the network has its variables should be

reused. 'scope' must be given to be reused.

scope: Optional variable_scope. Default value is 'Generator'.

fused_batch_norm: If 'True', use a faster, fused implementation

of batch normalization.

Returns:

logits: The pre-sortmax activations, a float32 tensor with shape

[batch_size, final_size, final_size, num_outputs].

end_points: A dictionary from components of the network to their

activation.

Raises:

ValueError: If 'inputs' is not 2-dimensional, or if 'final_size'

is not a power of 2 or is less than 8.

"""

normalizer_fn = slim.batch_norm

normalizer_fn_args = {

'is_training': is_training,

'zero_debias_moving_mean': True,

'fused': fused_batch_norm}

inputs.get_shape().assert_has_rank(2)

if math.log(final_size, 2) != int(math.log(final_size, 2)):

raise ValueError("'final_size' (%i) must be a power of 2."

% final_size)

if final_size < 8:

raise ValueError("'final_size' (%i) must be greater than 8."

% final_size)

end_points = {}

num_layers = int(math.log(final_size, 2)) - 1

with tf.variable_scope(scope, values=[inputs], reuse=reuse) as scope:

with slim.arg_scope([normalizer_fn], **normalizer_fn_args):

with slim.arg_scope([slim.conv2d_transpose],

normalizer_fn=normalizer_fn,

stride=2, kernel_size=4):

net = tf.expand_dims(tf.expand_dims(inputs, 1), 1)

# First upscaling is different because it takes the input

# vector.

current_depth = depth * 2 ** (num_layers - 1)

scope = 'deconv1'

net = slim.conv2d_transpose(net, current_depth, stride=1,

padding='VALID', scope=scope)

end_points[scope] = net

for i in range(2, num_layers):

scope = 'deconv%i' % i

current_depth = depth * 2 * (num_layers - i)

net = slim.conv2d_transpose(net, current_depth,

scope=scope)

end_points[scope] = net

# Last layer has different normalizer and activation.

scope = 'deconv%i' % num_layers

net = slim.conv2d_transpose(net, depth, normalizer_fn=None,

activation_fn=None, scope=scope)

end_points[scope] = net

# Convert to proper channels

scope = 'logits'

logits = slim.conv2d(

net,

num_outputs,

normalizer_fn=None,

activation_fn=tf.nn.tanh,

kernel_size=1,

stride=1,

padding='VALID',

scope=scope)

end_points[scope] = logits

logits.get_shape().assert_has_rank(4)

logits.get_shape().assert_is_compatible_with(

[None, final_size, final_size, num_outputs])

return logits, end_points

def dcgan_model(self,

real_data,

generator_inputs,

generator_scope='Generator',

discirminator_scope='Discriminator',

check_shapes=True):

"""Returns DCGAN model outputs and variables.

Modified from:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/

contrib/gan/python/train.py

Args:

real_data: A float32 tensor with shape [batch_size, height, width,

channels].

generator_inputs: A float32 tensor with shape [batch_size, N] for

any size N.

generator_scope: Optional genertor variable scope. Useful if you

want to reuse a subgraph that has already been created.

discriminator_scope: Optional discriminator variable scope. Useful

if you want to reuse a subgraph that has already been created.

check_shapes: If 'True', check that generator produces Tensors

that are the same shape as real data. Otherwise, skip this

check.

Returns:

A dictionary containing output tensors.

Raises:

ValueError: If the generator outputs a tensor that isn't the same

shape as 'real_data'.

"""

# Create models

with tf.variable_scope(generator_scope) as gen_scope:

generated_data, _ = self.generator(

generator_inputs, self._generator_depth, self._final_size,

self._num_outputs, self._is_training)

with tf.variable_scope(discirminator_scope) as dis_scope:

discriminator_gen_outputs, _ = self.discriminator(

generated_data, self._discirminator_depth, self._is_training)

with tf.variable_scope(dis_scope, reuse=True):

discriminator_real_outputs, _ = self.discriminator(

real_data, self._discirminator_depth, self._is_training)

if check_shapes:

if not generated_data.shape.is_compatible_with(real_data.shape):

raise ValueError('Generator output shape (%s) must be the '

'shape as real data (%s).'

% (generated_data.shape, real_data.shape))

# Get model-specific variables

generator_variables = slim.get_trainable_variables(gen_scope)

discriminator_variables = slim.get_trainable_variables(dis_scope)

return {'generated_data': generated_data,

'discriminator_gen_outputs': discriminator_gen_outputs,

'discriminator_real_outputs': discriminator_real_outputs,

'generator_variables': generator_variables,

'discriminator_variables': discriminator_variables}

def predict(self, generator_inputs):

"""Return the generated results by generator network.

Args:

generator_inputs: A float32 tensor with shape [batch_size, N] for

any size N.

Returns:

logits: The pre-sortmax activations, a float32 tensor with shape

[batch_size, final_size, final_size, num_outputs].

"""

logits, _ = self.generator(generator_inputs, self._generator_depth,

self._final_size, self._num_outputs,

is_training=False)

return logits

def discriminator_loss(self,

discriminator_real_outputs,

discriminator_gen_outputs,

label_smoothing=0.25):

"""Original minmax discriminator loss for GANs, with label smoothing.

Modified from:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/

contrib/gan/python/losses/python/losses_impl.py

Args:

discriminator_real_outputs: Discriminator output on real data.

discriminator_gen_outputs: Discriminator output on generated data.

Expected to be in the range of (-inf, inf).

label_smoothing: The amount of smoothing for positive labels. This

technique is taken from `Improved Techniques for Training GANs`

(https://arxiv.org/abs/1606.03498). `0.0` means no smoothing.

Returns:

loss_dict: A dictionary containing three scalar tensors.

"""

# -log((1 - label_smoothing) - sigmoid(D(x)))

losses_on_real = slim.losses.sigmoid_cross_entropy(

logits=discriminator_real_outputs,

multi_class_labels=tf.ones_like(discriminator_real_outputs),

label_smoothing=label_smoothing)

loss_on_real = tf.reduce_mean(losses_on_real)

# -log(- sigmoid(D(G(x))))

losses_on_generated = slim.losses.sigmoid_cross_entropy(

logits=discriminator_gen_outputs,

multi_class_labels=tf.zeros_like(discriminator_gen_outputs))

loss_on_generated = tf.reduce_mean(losses_on_generated)

loss = loss_on_real + loss_on_generated

return {'dis_loss': loss,

'dis_loss_on_real': loss_on_real,

'dis_loss_on_generated': loss_on_generated}

def generator_loss(self, discriminator_gen_outputs, label_smoothing=0.0):

"""Modified generator loss for DCGAN.

Modified from:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/

contrib/gan/python/losses/python/losses_impl.py

Args:

discriminator_gen_outputs: Discriminator output on generated data.

Expected to be in the range of (-inf, inf).

Returns:

loss: A scalar tensor.

"""

losses = slim.losses.sigmoid_cross_entropy(

logits=discriminator_gen_outputs,

multi_class_labels=tf.ones_like(discriminator_gen_outputs),

label_smoothing=label_smoothing)

loss = tf.reduce_mean(losses)

return loss

def loss(self, discriminator_real_outputs, discriminator_gen_outputs):

"""Computes the loss of DCGAN.

Args:

discriminator_real_outputs: Discriminator output on real data.

discriminator_gen_outputs: Discriminator output on generated data.

Expected to be in the range of (-inf, inf).

Returns:

A dictionary contraining 4 scalar tensors.

"""

dis_loss_dict = self.discriminator_loss(discriminator_real_outputs,

discriminator_gen_outputs)

gen_loss = self.generator_loss(discriminator_gen_outputs)

dis_loss_dict.update({'gen_loss': gen_loss})

return dis_loss_dict

二、训练并生成图像

深度卷积生成对抗网络 DCGAN 论文的作者总结了他们取得将生成对抗网络用于无监督、稳定的生成图像成功的一些技术:

上面的代码(model.py) 基本上忠实的采用了这些技术。一些细微的差别为:

从随机分布中采样出的向量长度为 64,而不是论文中的 100;

用于训练的真实图像的分辨率只能是 n x n,其中 n 必须是 2 的幂;

生成图像的分辨率也只能是 m x m,其中 m 必须是 2 的幂;

这一节关注训练 DCGAN 的问题。首先,将训练文件(命名为 train.py)的代码列出如下:

#!/usr/bin/env python3

# -*- coding: utf-8 -*-

"""

Created on Sun May 27 16:55:12 2018

@author: shirhe-lyh

"""

"""Train a DCGAN to generating fake images.

Example Usage:

---------------

python3 train.py \

--images_dir: Path to real images directory.

--images_pattern: The pattern of input images.

--generated_images_save_dir: Path to directory where to write gen images.

--logdir: Path to log directory.

--num_steps: Number of steps.

"""

import cv2

import glob

import numpy as np

import os

import tensorflow as tf

import model

flags = tf.flags

flags.DEFINE_string('images_dir', None, 'Path to real images directory.')

flags.DEFINE_string('images_pattern', '*.jpg', 'The pattern of input images.')

flags.DEFINE_string('generated_images_save_dir', None, 'Path to directory '

'where to write generated images.')

flags.DEFINE_string('logdir', './training', 'Path to log directory.')

flags.DEFINE_integer('num_steps', 20000, 'Number of steps.')

FLAGS = flags.FLAGS

def get_next_batch(batch_size=64):

"""Get a batch set of real images and random generated inputs."""

if not os.path.exists(FLAGS.images_dir):

raise ValueError('images_dir is not exist.')

images_path = os.path.join(FLAGS.images_dir, FLAGS.images_pattern)

image_files_list = glob.glob(images_path)

image_files_arr = np.array(image_files_list)

selected_indices = np.random.choice(len(image_files_list), batch_size)

selected_image_files = image_files_arr[selected_indices]

images = read_images(selected_image_files)

# generated_inputs = np.random.normal(size=[batch_size, 64])

generated_inputs = np.random.uniform(

low=-1, high=1.0, size=[batch_size, 64])

return images, generated_inputs

def read_images(image_files):

"""Read images by OpenCV."""

images = []

for image_path in image_files:

image = cv2.imread(image_path)

image = cv2.resize(image, (64, 64))

image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

image = (image - 127.5) / 127.5

images.append(image)

return np.array(images)

def write_images(generated_images, images_save_dir, num_step):

"""Write images to a given directory."""

#Scale images from [-1, 1] to [0, 255].

generated_images = ((generated_images + 1) * 127.5).astype(np.uint8)

for j, image in enumerate(generated_images):

image_name = 'generated_step{}_{}.jpg'.format(num_step+1, j+1)

image_path = os.path.join(FLAGS.generated_images_save_dir,

image_name)

image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

cv2.imwrite(image_path, image)

def main(_):

# Define placeholder

real_data = tf.placeholder(

tf.float32, shape=[None, 64, 64, 3], name='real_data')

generated_inputs = tf.placeholder(

tf.float32, [None, 64], name='generated_inputs')

# Create DCGAN model

dcgan_model = model.DCGAN(is_training=True, final_size=64)

outputs_dict = dcgan_model.dcgan_model(real_data, generated_inputs)

generated_data = outputs_dict['generated_data']

generated_data_ = tf.identity(generated_data, name='generated_data')

discriminator_gen_outputs = outputs_dict['discriminator_gen_outputs']

discriminator_real_outputs = outputs_dict['discriminator_real_outputs']

generator_variables = outputs_dict['generator_variables']

discriminator_variables = outputs_dict['discriminator_variables']

loss_dict = dcgan_model.loss(discriminator_real_outputs,

discriminator_gen_outputs)

discriminator_loss = loss_dict['dis_loss']

discriminator_loss_on_real = loss_dict['dis_loss_on_real']

discriminator_loss_on_generated = loss_dict['dis_loss_on_generated']

generator_loss = loss_dict['gen_loss']

# Write loss values to logdir (tensorboard)

tf.summary.scalar('discriminator_loss', discriminator_loss)

tf.summary.scalar('discriminator_loss_on_real', discriminator_loss_on_real)

tf.summary.scalar('discriminator_loss_on_generated',

discriminator_loss_on_generated)

tf.summary.scalar('generator_loss', generator_loss)

merged_summary = tf.summary.merge_all(key=tf.GraphKeys.SUMMARIES)

# Create optimizer

discriminator_optimizer = tf.train.AdamOptimizer(learning_rate=0.0004, # 0.0005

beta1=0.5)

discriminator_train_step = discriminator_optimizer.minimize(

discriminator_loss, var_list=discriminator_variables)

generator_optimizer = tf.train.AdamOptimizer(learning_rate=0.0001,

beta1=0.5)

generator_train_step = generator_optimizer.minimize(

generator_loss, var_list=generator_variables)

saver = tf.train.Saver(var_list=tf.global_variables())

init = tf.global_variables_initializer()

with tf.Session() as sess:

sess.run(init)

# Write model graph to tensorboard

if not FLAGS.logdir:

raise ValueError('logdir is not specified.')

if not os.path.exists(FLAGS.logdir):

os.makedirs(FLAGS.logdir)

writer = tf.summary.FileWriter(FLAGS.logdir, sess.graph)

fixed_images, fixed_generated_inputs = get_next_batch()

for i in range(FLAGS.num_steps):

if (i+1) % 500 == 0:

batch_images = fixed_images

batch_generated_inputs = fixed_generated_inputs

else:

batch_images, batch_generated_inputs = get_next_batch()

train_dict = {real_data: batch_images,

generated_inputs: batch_generated_inputs}

# Update discriminator network

sess.run(discriminator_train_step, feed_dict=train_dict)

# Update generator network five times

sess.run(generator_train_step, feed_dict=train_dict)

sess.run(generator_train_step, feed_dict=train_dict)

sess.run(generator_train_step, feed_dict=train_dict)

sess.run(generator_train_step, feed_dict=train_dict)

sess.run(generator_train_step, feed_dict=train_dict)

summary, generated_images = sess.run(

[merged_summary, generated_data], feed_dict=train_dict)

# Write loss values to tensorboard

writer.add_summary(summary, i+1)

if (i+1) % 500 == 0:

# Save model

model_save_path = os.path.join(FLAGS.logdir, 'model.ckpt')

saver.save(sess, save_path=model_save_path, global_step=i+1)

# Save generated images

if not FLAGS.generated_images_save_dir:

FLAGS.generated_images_save_dir = './generated_images'

if not os.path.exists(FLAGS.generated_images_save_dir):

os.makedirs(FLAGS.generated_images_save_dir)

write_images(

generated_images, FLAGS.generated_images_save_dir, i)

writer.close()

if __name__ == '__main__':

tf.app.run()

这个文件定义了 4 个函数,从上到下分别是:用于随机采样一个批量训练数据的函数 get_next_batch,用于从本地文件夹读取训练图像的函数 read_images,用于将生成器生成的图像保存到某一文件夹的函数 write_images,以及训练整个深度卷积生成对抗网络的主函数 main。前 3 个内容少而简单,直接略过,我们只看 main 函数。主函数首先定义了两个占位符,用于作为数据入口。接下来,实例化一个类 DCGAN 的一个对象,然后作用于占位符上,得到模型输出和 4 个损失,紧随其后的 5 条语句 tf.summary 将损失写入到日志文件,其目的是可以使用 tensorboard 在浏览器中可视化的查看损失的变化情况。再然后是定义了两个优化器:discriminator_optimizer、generateor_optimizer,分别用于优化判别器和生成器的损失。最后,在定义了模型保存对象 saver 和将模型的 graph 写入到日志文件之后,来到了训练过程(for 循环):

随机从训练图像中选择一个批量的训练样本;

每优化 1 次判别器都要相继优化 5 次生成器;

每训练 500 步保存一次生成的图像和模型。

另外,为了能够看清模型生成的图像的演化过程,每训练 500 步都使用同样的输入数据。

关于生成对抗网络训练的方法,GAN 讲得比较清晰:

我们需要关注的一个重点是:判别器每训练 k 次,生成器训练 1 次。但按照我自己的理解(可能有误),应该是:生成器每训练 k 次,判别器训练 1 次。这是因为,在训练的早期,生成器生成的样本与训练的真实样本差别很大,判别器能够轻而易举的识别出来,因此损失 discriminator_loss_on_generated 会迅速的下降到 0,为了延缓这个损失的下降,以及为了让生成器得到充分的训练尽快生成质量较高的样本,选择连续优化生成器 k 次。

回到我们生成小姐姐头像的问题,经过多次实验,最终选择 k = 5,即每优化 5 次生成器才优化 1 次判别器。这样训练可以让损失 discriminator_loss_on_generated 以及损失 generator_loss 有一段相当长的对抗平衡过程,从而能够让生成器能够长时间的得到优化,进而生成质量较高的图像。

在项目的当前目录的终端执行:

python3 train.py --images_dir path/to/images/directory

此时会在当前目录下生成一个新的文件夹:training,这个文件夹用来保存训练过程中产生的数据,如模型各种参数等。然后,再运行 tensorboard:

tensorboard --logdir ./training

打开终端返回的浏览器链接,你可以在 SCALARS 页面下看到四条损失曲线,为了更深刻的理解生成对抗网络,建议你仔细的观察这些损失曲线的变化过程,并思考怎样调整参数,让网络生成更逼真的图像。

train.py 中 main 函数中的优化器的参数是我试验了很多次之后确定的,虽然还不是很让人满意的参数,但已经可以生成一些比较好的图像了,如训练 15500 次之后生成的图像为(所有生成的图像都保存在文件夹 generated_images):

图3 训练 15500 次之后生成器生成的图像

可以看到,生成的图像整体质量已经比较好了。如果从中挑选出一些比较满意的图像的话,下面这些生成的小姐姐应该可以以假乱真了:

图4 生成器生成的质量上佳的图像

当然,清晰度还需要继续提高。

三、训练的一些细节

训练生成对抗网络时,需要调整的重点是:两个优化器的学习率和判别器每优化 1 次生成器优化的次数 k。为了学习率的确定更简单,可以使用自适应学习率的优化器 Adam,此时,一般的初始学习率为 0.0001,调整时,可以固定其中一个,而重点去调整另外一个。调整过程中,需要确保损失 discriminator_loss_on_generated 不会一直下降,对应的,即损失 generator_loss 不能一直上升,比较理想的情况是两者都稳定在某一数值附近波动。一般的,如果训练 500 次之后,文件夹 generated_images 里生成的图像都是糊的,说明当前学习率选得不好,要中断训练过程重新调整学习率;而如果此时文件夹里的图像已经依稀有人脸特征,说明可以继续往下训练。以下是我某次训练时的损失曲线(所有参数跟 train.py 中的一样):

图5 训练 20000 次的损失曲线

根据上图,损失 discriminator_loss_on_generated 和 generator_loss 在 5000 次训练之前处于平衡状态,此时生成的图像越来越清晰。但 5500 次训练之后,损失 generator_loss 开始迅速增大,生成的图像全部变成噪声图像(见下图),此后,在训练 8000 次之后,generator_loss 损失又急剧降到低水平,此时生成的质量又开始变好。到 16000 次之后,随着损失 generator_loss 再次变大,生成的图像再次变糊。对照以上过程,整个训练过程中生成的对应图像如下(因 16000 次之后的图像全是糊的,故略去):

图6 训练过程中生成的图像,请对照损失曲线 generator_loss 观看

最后,需要说明的一点是,在选择生成器输入的随机分布时,如果使用正态分布(见函数 get_next_batch 被注释的一行):

generated_inputs = np.random.normal(size=[batch_size, 64])

则生成的图像中会有很多是相似的,如第 17500 次训练时生成的 64 张图像中:

图7 使用标准正态分布作为生成器输入时会生成的图像

第 12、14、21、26、27、39、46、49 张图像,及第 9、22、28、33、42、47、56、61 张图像都非常相似(说明标准正太分布生成的样本本身很相似,适用于条件生成对抗网络)。而采用均匀分布:

generated_inputs = np.random.uniform(low=-1, high=1.0, size=[batch_size, 64])

则可以极大的缓解这个问题,见图 3。

你可能感兴趣的:(tensorflow一维卷积输入_TensorFlow 从零开始实现深度卷积生成对抗网络(DCGAN))