A Beginner’s Guide to Generative Adversarial Networks (GANs)

Generative adversarial networks (GANs) are deep neural net architectures comprised of two nets, pitting one against the other (thus the “adversarial”).

GANs were introduced in a paper by Ian Goodfellow and other researchers at the University of Montreal, including Yoshua Bengio, in 2014. Referring to GANs, Facebook’s AI research director Yann LeCun called adversarial training “the most interesting idea in the last 10 years in ML.”

GANs’ potential is huge, because they can learn to mimic any distribution of data. That is, GANs can be taught to create worlds eerily similar to our own in any domain: images, music, speech, prose. They are robot artists in a sense, and their output is impressive – poignant even.

How GANs Work

One neural network, called the generator, generates new data instances, while the other, the discriminator, evaluates them for authenticity; i.e. the discriminator decides whether each instance of data it reviews belongs to the actual training dataset or not.

Let’s say we’re trying to do something more banal than mimic the Mona Lisa. We’re going to generate hand-written numerals like those found in MNIST. We can start with the MNIST dataset, taken from the real world. The goal of the discriminator, when shown an instance from this dataset, is to recognize it as authentic.

Meanwhile, the generator is creating new images that it passes to the discriminator in the hopes that they, too, will be deemed authentic. The goal of the generator is to generate passable hand-written digits, to lie without being caught. The goal of the discriminator is to identify images coming from the generator as fake.

Here are the steps a GAN takes:

  • The generator takes in random numbers and returns an image.
  • This generated image is fed into the discriminator alongside a stream of images taken from the actual dataset.
  • The discriminator takes in both real and fake images and returns probabilities, a number between 0 and 1, with 1 representing a prediction of authenticity and 0 representing fake.

So you have a double feedback loop:

  • The discriminator is in a feedback loop with the ground truth of the images, which we know.
  • The generator is in a feedback loop with the discriminator.

You can think of a GAN as the combination of a counterfeiter and a cop in a game of cat and mouse, where the counterfeiter is learning to pass false notes, and the cop is learning to detect them. Both are dynamic; i.e. the cop is in training, too (maybe the central bank is flagging bills that slipped through), and each side comes to learn the other’s methods in a constant escalation.

The discriminator network is a standard convolutional network that can categorize the images fed to it, a binomial classifier labeling images as real or fake. The generator is an inverse convolutional network, in a sense: While a standard convolutional classifier takes an image and downsamples it to produce a probability, the generator takes a vector of random noise and upsamples it to an image. The first throws away data through downsampling techniques like maxpooling, and the second generates new data.

Both nets are trying to optimize a different and opposing objective function, or loss function, in a zero-zum game. This is essentially an actor-critic model. As the discriminator changes its behavior, so does the generator, and vice versa. Their losses push against each other.

If you want to learn more about generating images, Brandon Amos wrote a great post about interpreting images as samples from a probability distribution.

Tips in Training a GAN

When you train the discriminator, hold the generator values constant; and when you train the generator, hold the discriminator constant. Each should train against a static adversary. For example, this gives the generator a better read on the gradient it must learn by.

By the same token, pretraining the discriminator against MNIST before you start training the generator will establish a clearer gradient.

Each side of the GAN can overpower the other. If the discriminator is too good, it will return values so close to 0 or 1 that the generator will struggle to read the gradient. If the generator is too good, it will persistently exploit weaknesses in the discriminator that lead to false negatives. This may be mitigated by the nets’ respective learning rates.

GANs take a long time to train. On a single GPU a GAN might take hours, and on a single CPU more than a day. While difficult to tune and therefore to use, GANs have stimulated a lot of interesting research and writing.

Note: Deeplearning4j’s latest release on Maven does not include GANs, but it will soon be possible to build and use them via auto-differentiation and model import from Keras and Tensorflow, all of which are currently available in the master repository on Github.

GAN Use Cases

Notable Papers on GANs

  • [Generative Adversarial Nets] [Paper] [Code](Ian Goodfellow’s breakthrough paper)

Unclassified Papers & Resources

Generating High-Quality Images

  • [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks] [Paper][Code](Gan with convolutional networks)(ICLR)

  • [Generative Adversarial Text to Image Synthesis] [Paper][Code][Code]

  • [Improved Techniques for Training GANs] [Paper][Code](Goodfellow’s paper)

  • [Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space] [Paper][Code]

  • [StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks] [Paper][Code]

  • [Improved Training of Wasserstein GANs] [Paper][Code]

  • [Boundary Equibilibrium Generative Adversarial Networks Implementation in Tensorflow] [Paper][Code]

  • [Progressive Growing of GANs for Improved Quality, Stability, and Variation ] [Paper][Code]

Semi-supervised learning

  • [Adversarial Training Methods for Semi-Supervised Text Classification] [Paper][Note]( Ian Goodfellow Paper)

  • [Improved Techniques for Training GANs] [Paper][Code](Goodfellow’s paper)

  • [Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks] [Paper](ICLR)

  • [Semi-Supervised QA with Generative Domain-Adaptive Nets] [Paper](ACL 2017)

Ensembles

  • [AdaGAN: Boosting Generative Models] [Paper][[Code]](Google Brain)

Clustering

  • [Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks] [Paper](ICLR)

Image blending

  • [GP-GAN: Towards Realistic High-Resolution Image Blending] [Paper][Code]

Image Inpainting

  • [Semantic Image Inpainting with Perceptual and Contextual Losses] [Paper][Code](CVPR 2017)

  • [Context Encoders: Feature Learning by Inpainting] [Paper][Code]

  • [Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks] [Paper]

  • [Generative face completion] [Paper][Code](CVPR2017)

  • [Globally and Locally Consistent Image Completion] [MainPAGE](SIGGRAPH 2017)

Joint Probability

Super-Resolution

  • [Image super-resolution through deep learning ][Code](Just for face dataset)

  • [Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network] [Paper][Code](Using Deep residual network)

  • [EnhanceGAN] Docs[[Code]]

De-occlusion

  • [Robust LSTM-Autoencoders for Face De-Occlusion in the Wild] [Paper]

Semantic Segmentation

  • [Adversarial Deep Structural Networks for Mammographic Mass Segmentation] [Paper][Code]

  • [Semantic Segmentation using Adversarial Networks] [Paper](Soumith’s paper)

Object Detection

  • [Perceptual generative adversarial networks for small object detection] [Paper](CVPR 2017)

  • [A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection] [Paper][Code](CVPR2017)

RNN-GANs

  • [C-RNN-GAN: Continuous recurrent neural networks with adversarial training] [Paper][Code]

Conditional Adversarial Nets

  • [Conditional Generative Adversarial Nets] [Paper][Code]

  • [InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets] [Paper][Code][Code]

  • [Conditional Image Synthesis With Auxiliary Classifier GANs] [Paper][Code](GoogleBrain ICLR 2017)

  • [Pixel-Level Domain Transfer] [Paper][Code]

  • [Invertible Conditional GANs for image editing] [Paper][Code]

  • [Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space] [Paper][Code]

  • [StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks] [Paper][Code]

Video Prediction & Generation

  • [Deep multi-scale video prediction beyond mean square error] [Paper][Code](Yann LeCun’s paper)

  • [Generating Videos with Scene Dynamics] [Paper][Web][Code]

  • [MoCoGAN: Decomposing Motion and Content for Video Generation] [Paper]

Texture Synthesis & Style Transfer

  • [Precomputed real-time texture synthesis with markovian generative adversarial networks] [Paper][Code](ECCV 2016)

Image Translation

  • [UNSUPERVISED CROSS-DOMAIN IMAGE GENERATION] [Paper][Code]

  • [Image-to-image translation using conditional adversarial nets] [Paper][Code][Code]

  • [Learning to Discover Cross-Domain Relations with Generative Adversarial Networks] [Paper][Code]

  • [Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks] [Paper][Code]

  • [CoGAN: Coupled Generative Adversarial Networks] [Paper][Code](NIPS 2016)

  • [Unsupervised Image-to-Image Translation with Generative Adversarial Networks] [Paper]

  • [Unsupervised Image-to-Image Translation Networks] [Paper]

  • [Triangle Generative Adversarial Networks] [Paper]

GAN Theory

  • [Energy-based generative adversarial network] [Paper][Code](Lecun paper)

  • [Improved Techniques for Training GANs] [Paper][Code](Goodfellow’s paper)

  • [Mode Regularized Generative Adversarial Networks] [Paper](Yoshua Bengio , ICLR 2017)

  • [Improving Generative Adversarial Networks with Denoising Feature Matching] [Paper][Code](Yoshua Bengio , ICLR 2017)

  • [Sampling Generative Networks] [Paper][Code]

  • [How to train Gans] [Docu]

  • [Towards Principled Methods for Training Generative Adversarial Networks] [Paper](ICLR 2017)

  • [Unrolled Generative Adversarial Networks] [Paper][Code](ICLR 2017)

  • [Least Squares Generative Adversarial Networks] [Paper][Code](ICCV 2017)

  • [Wasserstein GAN] [Paper][Code]

  • [Improved Training of Wasserstein GANs] [Paper][Code](The improve of wgan)

  • [Towards Principled Methods for Training Generative Adversarial Networks] [Paper]

  • [Generalization and Equilibrium in Generative Adversarial Nets] [Paper](ICML 2017)

3-Dimensional GANs

  • [Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling] [Paper][Web][Code](2016 NIPS)

  • [Transformation-Grounded Image Generation Network for Novel 3D View Synthesis] [Web](CVPR 2017)

Music

  • [MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation using 1D and 2D Conditions] [Paper][HOMEPAGE]

Face Generation & Editing

  • [Autoencoding beyond pixels using a learned similarity metric] [Paper][Code][Tensorflow code]

  • [Coupled Generative Adversarial Networks] [Paper][Caffe Code][Tensorflow Code](NIPS)

  • [Invertible Conditional GANs for image editing] [Paper][Code]

  • [Learning Residual Images for Face Attribute Manipulation] [Paper][Code](CVPR 2017)

  • [Neural Photo Editing with Introspective Adversarial Networks] [Paper][Code](ICLR 2017)

  • [Neural Face Editing with Intrinsic Image Disentangling] [Paper](CVPR 2017)

  • [GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data ] [Paper](BMVC 2017)[Code]

  • [Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis] [Paper](ICCV 2017)

For Discrete Distributions

  • [Maximum-Likelihood Augmented Discrete Generative Adversarial Networks] [Paper]

  • [Boundary-Seeking Generative Adversarial Networks] [Paper]

  • [GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution] [Paper]

Improving Classification & Recognition

  • [Generative OpenMax for Multi-Class Open Set Classification] [Paper](BMVC 2017)

  • [Controllable Invariance through Adversarial Feature Learning] [Paper][Code](NIPS 2017)

  • [Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro] [Paper][Code] (ICCV2017)

  • [Learning from Simulated and Unsupervised Images through Adversarial Training] [Paper][Code](Apple paper, CVPR 2017 Best Paper)

Projects

  • [cleverhans] [Code](A library for benchmarking vulnerability to adversarial examples)

  • [reset-cppn-gan-tensorflow] [Code](Using Residual Generative Adversarial Networks and Variational Auto-encoder techniques to produce high-resolution images)

  • [HyperGAN] [Code](Open source GAN focused on scale and usability)

Tutorials

Chat with us on Gitter