[code reading] warpgan: automatic card generation

Code link

Reference book: Tensorflow practice Google deep learning framework

I think we can see how tensorflow is built and train a neural network more clearly in Chapter 3.

1. train.py

This document defines the main function

def main(args):


# Initalization for running
    if config.save_model:
        log_dir = utils.create_log_dir(config, config_file)
        summary_writer = tf.summary.FileWriter(log_dir, network.graph)
    if config.restore_model:
        network.restore_model(config.restore_model, config.restore_scopes)

    proc_func = lambda images: preprocess(images, config, True)
    trainset.start_batch_queue(config.batch_size, proc_func=proc_func)

The config parameter settings here are all from the file WarpGANconfigdefault.py

Operations such as dataset read initialization come from the file WarpGAN\utils\dataset.py

Main circulation:

# Main Loop
    print('\nStart Training\nname: {}\n# epochs: {}\nepoch_size: {}\nbatch_size: {}\n'.format(
            config.name, config.num_epochs, config.epoch_size, config.batch_size))
    global_step = 0
    start_time = time.time()
    for epoch in range(config.num_epochs):

        if epoch == 0: test(network, config, log_dir, global_step)

        # Training
        for step in range(config.epoch_size):
            # Prepare input
            learning_rate = utils.get_updated_learning_rate(global_step, config)
            batch = trainset.pop_batch_queue()

            wl, sm, global_step = network.train(batch['images'], batch['labels'], batch['is_photo'], learning_rate, config.keep_prob)

            wl['lr'] = learning_rate

            # Display
            if step % config.summary_interval == 0:
                duration = time.time() - start_time
                start_time = time.time()
                utils.display_info(epoch, step, duration, wl)
                if config.save_model:
                    summary_writer.add_summary(sm, global_step=global_step)

wl, sm, global_step = network.train(batch['images'], batch['labels'], batch['is_photo'], learning_rate, config.keep_prob)

This sentence is the key point, calling the training of the network

2. warpgan.py

This file defines the calculation diagram, forward propagation and loss function of warpgan network.

The process of training neural network can be summarized as the following three steps:

1) Define the structure of neural network and the output of forward propagation

2) Define the loss function (calculated from the output of forward propagation) and the algorithm of back propagation optimization

3) Generate session (tf.Session ()), and run back propagation optimization algorithm repeatedly on training data

    def train(self, images_batch, labels_batch, switch_batch, learning_rate, keep_prob):
        images_A = images_batch[~switch_batch]
        images_B = images_batch[switch_batch]
        labels_A = labels_batch[~switch_batch]
        labels_B = labels_batch[switch_batch]
        scales_A = np.ones((images_A.shape[0]))
        scales_B = np.ones((images_B.shape[0]))
        feed_dict = {   self.images_A: images_A,
                        self.images_B: images_B,
                        self.labels_A: labels_A,
                        self.labels_B: labels_B,
                        self.scales_A: scales_A,
                        self.scales_B: scales_B,
                        self.learning_rate: learning_rate,
                        self.keep_prob: keep_prob,
                        self.phase_train: True,}
        _, wl, sm = self.sess.run([self.train_op, self.watch_list, self.summary_op], feed_dict = feed_dict)

        step = self.sess.run(self.global_step)

        return wl, sm, step

The train function is called by the above train.py to generate a session

Among them, self.train-op, self.watch-list, self.summary-op, self.global-step are several operations respectively.

We mainly focus on the operation of self.train-op (update parameters)

It's defined in the initialize function, which defines the forward propagation and loss functions

    def initialize(self, config, num_classes=None):
            Initialize the graph from scratch according to config.
        with self.graph.as_default():
            with self.sess.as_default():
                # Set up placeholders
                h, w = config.image_size
                channels = config.channels
                self.images_A = tf.placeholder(tf.float32, shape=[None, h, w, channels], name='images_A')
                self.images_B = tf.placeholder(tf.float32, shape=[None, h, w, channels], name='images_B')
                self.labels_A = tf.placeholder(tf.int32, shape=[None], name='labels_A')
                self.labels_B = tf.placeholder(tf.int32, shape=[None], name='labels_B')
                self.scales_A = tf.placeholder(tf.float32, shape=[None], name='scales_A')
                self.scales_B = tf.placeholder(tf.float32, shape=[None], name='scales_B')

                self.learning_rate = tf.placeholder(tf.float32, name='learning_rate')
                self.keep_prob = tf.placeholder(tf.float32, name='keep_prob')
                self.phase_train = tf.placeholder(tf.bool, name='phase_train')
                self.global_step = tf.Variable(0, trainable=False, dtype=tf.int32, name='global_step')

                self.setup_network_model(config, num_classes)

                # Build generator
                encode_A, styles_A = self.encoder(self.images_A)
                encode_B, styles_B = self.encoder(self.images_B)

                deform_BA, render_BA, ldmark_pred, ldmark_diff = self.decoder(encode_B, self.scales_B, None)
                render_AA = self.decoder(encode_A, self.scales_A, styles_A, texture_only=True)
                render_BB = self.decoder(encode_B, self.scales_B, styles_B, texture_only=True)

                self.styles_A = tf.identity(styles_A, name='styles_A')
                self.styles_B = tf.identity(styles_B, name='styles_B')
                self.deform_BA = tf.identity(deform_BA, name='deform_BA')
                self.ldmark_pred = tf.identity(ldmark_pred, name='ldmark_pred')
                self.ldmark_diff = tf.identity(ldmark_diff, name='ldmark_diff')

                # Build discriminator for real images
                patch_logits_A, logits_A = self.discriminator(self.images_A)
                patch_logits_B, logits_B = self.discriminator(self.images_B)
                patch_logits_BA, logits_BA = self.discriminator(deform_BA)                          

                # Show images in TensorBoard
                image_grid_A = tf.stack([self.images_A, render_AA], axis=1)[:1]
                image_grid_B = tf.stack([self.images_B, render_BB], axis=1)[:1]
                image_grid_BA = tf.stack([self.images_B, deform_BA], axis=1)[:1]
                image_grid = tf.concat([image_grid_A, image_grid_B, image_grid_BA], axis=0)
                image_grid = tf.reshape(image_grid, [-1] + list(self.images_A.shape[1:]))
                image_grid = self.image_grid(image_grid, (3,2))
                tf.summary.image('image_grid', image_grid)

                # Build all losses
                self.watch_list = {}
                loss_list_G  = []
                loss_list_D  = []
                # Advesarial loss for deform_BA
                loss_D, loss_G = self.cls_adv_loss(logits_A, logits_B, logits_BA,
                    self.labels_A, self.labels_B, self.labels_B, num_classes)
                loss_D, loss_G = config.coef_adv*loss_D, config.coef_adv*loss_G

                self.watch_list['LDg'] = loss_D
                self.watch_list['LGg'] = loss_G

                # Patch Advesarial loss for deform_BA
                loss_D, loss_G = self.patch_adv_loss(patch_logits_A, patch_logits_B, patch_logits_BA)
                loss_D, loss_G = config.coef_patch_adv*loss_D, config.coef_patch_adv*loss_G

                self.watch_list['LDp'] = loss_D
                self.watch_list['LGp'] = loss_G

                # Identity Mapping (Reconstruction) loss
                loss_idt_A = tf.reduce_mean(tf.abs(render_AA - self.images_A), name='idt_loss_A')
                loss_idt_A = config.coef_idt * loss_idt_A

                loss_idt_B = tf.reduce_mean(tf.abs(render_BB - self.images_B), name='idt_loss_B')
                loss_idt_B = config.coef_idt * loss_idt_B

                self.watch_list['idtA'] = loss_idt_A
                self.watch_list['idtB'] = loss_idt_B

                # Collect all losses
                reg_loss = tf.reduce_sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES), name='reg_loss')
                self.watch_list['reg_loss'] = reg_loss

                loss_G = tf.add_n(loss_list_G, name='loss_G')
                grads_G = tf.gradients(loss_G, self.G_vars)

                loss_D = tf.add_n(loss_list_D, name='loss_D')
                grads_D = tf.gradients(loss_D, self.D_vars)

                # Training Operaters
                train_ops = []

                opt_G = tf.train.AdamOptimizer(self.learning_rate, beta1=0.5, beta2=0.9)
                opt_D = tf.train.AdamOptimizer(self.learning_rate, beta1=0.5, beta2=0.9)
                apply_G_gradient_op = opt_G.apply_gradients(list(zip(grads_G, self.G_vars)))
                apply_D_gradient_op = opt_D.apply_gradients(list(zip(grads_D, self.D_vars)))

                update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
                train_ops.extend([apply_G_gradient_op, apply_D_gradient_op] + update_ops)

                train_ops.append(tf.assign_add(self.global_step, 1))
                self.train_op = tf.group(*train_ops)

                # Collect TF summary
                for k,v in self.watch_list.items():
                    tf.summary.scalar('losses/' + k, v)
                tf.summary.scalar('learning_rate', self.learning_rate)
                self.summary_op = tf.summary.merge_all()

                # Initialize variables
                self.saver = tf.train.Saver(tf.trainable_variables(), max_to_keep=99)


According to the forward propagation defined here, we can draw the following forward propagation diagram:

Combined with the network structure diagram in the paper, we can draw the following detailed process diagram:

3. WarpGAN\models\default.py

This file defines the detailed structure of three networks: encoder, decoder and discriminator, which are called by warpgan.py

In order to find out how the feature points are trained, I mainly look at the warpcontoller, a sub network of generating face feature points, which belongs to the decoder

                    with tf.variable_scope('WarpController'):

                        print('-- WarpController')

                        net = encoded
                        warp_input = tf.identity(images_rendered, name='warp_input')

                        net = slim.flatten(net)

                        net = slim.fully_connected(net, 128, scope='fc1')
                        print('module fc1 shape:', [dim.value for dim in net.shape])

                        num_ldmark = 16

                        # Predict the control points
                        ldmark_mean = (np.random.normal(0,50, (num_ldmark,2)) + np.array([[0.5*h,0.5*w]])).flatten()
                        ldmark_mean = tf.Variable(ldmark_mean.astype(np.float32), name='ldmark_mean')
                        print('ldmark_mean shape:', [dim.value for dim in ldmark_mean.shape])

                        ldmark_pred = slim.fully_connected(net, num_ldmark*2, 
                            normalizer_fn=None, activation_fn=None, biases_initializer=None, scope='fc_ldmark')
                        ldmark_pred = ldmark_pred + ldmark_mean
                        print('ldmark_pred shape:', [dim.value for dim in ldmark_pred.shape])
                        ldmark_pred = tf.identity(ldmark_pred, name='ldmark_pred')

                        # Predict the displacements
                        ldmark_diff = slim.fully_connected(net, num_ldmark*2, 
                            normalizer_fn=None,  activation_fn=None, scope='fc_diff')
                        print('ldmark_diff shape:', [dim.value for dim in ldmark_diff.shape])
                        ldmark_diff = tf.identity(ldmark_diff, name='ldmark_diff')
                        ldmark_diff = tf.identity(tf.reshape(scales,[-1,1]) * ldmark_diff, name='ldmark_diff_scaled')

                        src_pts = tf.reshape(ldmark_pred, [-1, num_ldmark ,2])
                        dst_pts = tf.reshape(ldmark_pred + ldmark_diff, [-1, num_ldmark, 2])

                        diff_norm = tf.reduce_mean(tf.norm(src_pts-dst_pts, axis=[1,2]))
                        # tf.summary.scalar('diff_norm', diff_norm)
                        # tf.summary.scalar('mark', ldmark_pred[0,0])

                        images_transformed, dense_flow = sparse_image_warp(warp_input, src_pts, dst_pts,
                                regularization_weight = 1e-6, num_boundary_points=0)
                        dense_flow = tf.identity(dense_flow, name='dense_flow')

My understanding is as follows:

1) Feature points: ldmark + ldmark PRED. Each iteration of ldmark is a random point generated by adding a random number to the center. After ldmark PRED is updated in the network (the input of the network is an encode d picture), ldmark + mean is the result of one-step update of ldmark PRED.

2) Moving distance of feature points: obtained from the image after encode through the full connection network

The relationship between deformation and style transfer is explained as follows:

Different from other visual style conversion tasks, this paper not only involves texture differences, but also involves the transformation of geometric coordinates. Texture exaggerates local fine-grained features, such as the depth of wrinkles, while geometric deformation allows exaggeration of overall features, such as face shape. The traditional transmission network aims to reconstruct image from feature space using decoder network. Because the decoder is a group of nonlinear local filters, it is essentially affected by the spatial change. When there is a large geometric difference between the input domain and the output domain, the image quality of the decoder is poor and the information loss is serious. Warpage based methods, on the other hand, are limited by the inability to change content and fine-grained details. Therefore, style transformation and transformation modules are essential parts of our learning framework.

As shown in the figure below, without any module, the generator will not be able to narrow the gap between photos and comics, and the balance of confrontation between the generator and the discriminator will be broken, leading to the result of crash.

Therefore, the style transformation and deformation in this article must be carried out at the same time, not only finding the feature points to change the shape.

I saw another article recently, CariGANs, which is the same as this article. It also deforms the face according to the feature points. I think I can continue to look at it later.



Published 100 original articles, won praise 6, visited 20000+
Private letter follow

Tags: network Session Google Lambda

Posted on Wed, 12 Feb 2020 02:48:10 -0800 by RyanSmith345