Recently, we have been witnessing the increasing popularity of various tools that autonomously create images of people, objects, or scenes that never existed, or modify representations of existing objects by adding qualities they never had.
A prime example of such a solution is FaceApp. The app can take a picture of a person and create images of their face enhanced with extra features. For example, it can add a beard, make the person look older or younger, or add facial features that normally aren’t there.
All of that is possible thanks to GAN – Generative Adversarial Networks. Today, I’d like to explore this topic and make you familiar with GAN and related technologies.
GAN – Generative Adversarial Networks
First, let’s investigate the mechanism behind Generative Adversarial Networks. Imagine two independent neural networks; the first one called “discriminator,” is trained to recognize images, the second one, “generator,” learns how to generate them.
Both models play a game based on game theory. The generator’s goal is to trick the discriminator while the discriminator is trying to prevent it, using samples of true and artificial (generated) images as a weapon.
As both players learn, they become more skilled in the game. The generator produces more accurate images, and the discriminator gets better at distinguishing between real and fake samples. The generative model is deemed as fully-trained when the generator produces counterfeit images that are so realistic that the discriminator can no longer discern them. This means that the model has become able to generate on-demand images that are highly realistic.
GAN was invented in 2014 by Ian Goodfellow and his team. Together, they published an exciting research paper describing the concept.
Greater Stability, Speed, and Resolution — Enhancing GAN
GAN, though disruptive, had its challenges. The generated images were sharp only in lower resolution, and weren’t sufficiently diverse. And despite the ongoing research and development, the learning process still lacked stability.
Three years after GAN appearance, Tero Karras and his team came up with a new method of network training, and described it in work “Progressive Growing of GANs for Improved Quality, Stability, and Variation.” The method consisted of continuous refining of the generator and discriminator as they were being trained.
The researchers began training network models using low-res images, gradually enhancing the resolution by applying consecutive layers. The incremental method allowed the learning mechanism first to discover the large-scale structure of image decomposition, and then to focus on more fine-grained details of each image, instead of learning everything in a single go. This approach led to some striking results in the generation of highly realistic images of human faces.
Additionally, the mechanism greatly reduced training time, which decreased from 2 to 6 times, depending on the target resolution. The method also accelerated the development of the technology and led to its new applications, thanks to the high photorealism of generated images.
The images below, taken from a research paper on AI and ML development, best demonstrate how Karras’s breakthrough has impacted the development of GANs over the last few years.
Let’s now discuss the latest craze in apps – age progression/regression. In 2017, several studies were conducted on the topic. In one of them, a team of researchers from Tennessee University mentioned the use of Conditional Adversarial Autoencoder (CAAE).
As opposed to models available thus far, the CAAE mechanism didn’t require a massive collection of images of a person’s face at different ages as input. Instead, it assumed that every face could be represented in a multi-dimensional variety, where we can make it look older or younger by navigating a selected dimension, without incurring any loss of characteristic features.
The CAAE network is made of two discriminators that deliver incredibly realistic representations of any face at different ages.
How Does Conditional Adversarial Autoencoder Work?
The ‘E’ encoder maps a face image to the ‘z’ vector (personality). By adding the ‘l’ age label to the ‘z’ vector, it also creates a new latent vector [z,l] that provides the input for the generator ‘G.’
Both encoder and generator are updated based on discrepancy ‘L2’ between the input and output faces. The ‘Dz’ discriminator imposes an even distribution on ‘z,’ while the discriminator ‘Dimg’ requires for the output face to be photorealistic and credible for a given age label.
You can find a detailed explanation of this mechanism together with the research paper describing it here.
CAAE delivers highly realistic face progression at any given point in time (age), which results in its broad application in face recognition systems, entertainment, and marketing.
Another interesting application of GAN I’d like to mention is the ability to generate images from the text describing what they should represent.
In 2016, Han Zhang, in one of his works, StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, presented the concept of using GAN to produce text-to-photo translations. Now, we refer to his discovery as Stacked Generative Adversarial Networks (StackGAN).
StackGAN networks can generate photorealistic images in 256 x 256 px resolution from text descriptions. This elaborate process involves two stages:
Stage-I GAN (the upper part of the diagram) sketches a primitive shape of an object and applies colors to it, based on the provided text description, producing a low-resolution image.
Stage-II GAN takes this image and the original description as input and generates high-resolution images with realistic details. Additionally, it can remove various faults and aberrations generated in the first stage, and refine the image by adding tiny yet important details.
Here’s a sample of what this method can do:
Lately, I have been deliberating on the use of GAN in film-making. Imagine that one day, entire films will be generated by AI and delivered to us in real-time. You’ll get precisely the films you expect, tailored to your unique taste, where movie characters will be able to perform any stunt without special effects.
Currently, research is being conducted to make that vision come true. In 2016, Carl Vondrick published a research paper titled Generating Videos with Scene Dynamics, where he described a mechanism developed by him to enable dynamic generation of movie frames one by one.
To achieve that, Vondrick used GAN with a spatio-temporal convolutional architecture that would untangle every scene’s foreground from the background.
The technology devised by Vondrick still requires some work, but we can already envisage the opportunities it will present in the future.
Possible GAN Applications
GAN can have virtually unlimited applications in the near future. It will provide efficiencies across numerous tasks and processes in entertainment, machine design, and architecture, or sales and advertising. Here are a few possible scenarios.
Here’s how it goes: you go to your Netflix account, and get content that has been specially generated for you. No scenes and actors are real, they have never existed, but they are so realistic that you cannot tell them from the real thing. Lifelike representation of our reality equips the characters with skills and abilities that humans cannot possess and allows them to do incredible things without any CGI.
So while the entire movie is highly realistic, the time and cost of its production go significantly down. Going one step further, when the AI-based solution masters all human features and gestures, it will be able to create breathtaking pictures dynamically, on-demand.
Architecture and Interior Design
The second example of GAN’s use I’d like to discuss is its application in engineering and architecture. We can imagine at least a few practical use cases here. For instance, let’s assume you want to design or redecorate your flat.
First, you open a mobile app that scans your apartment through smartphone lenses. Then, you choose the design style. The underlying mechanism generates a real-time preview showing how the design would look like in your space. At the same time, it provides you with a list of furniture and appliances used in the design, together with a price list and stores where you can purchase all items.
In construction, we can apply a similar solution to generate plans and visualizations much faster and with a higher level of detail than they are currently made by humans.
E-commerce is another industry that can significantly benefit from GAN. Imagine an online fashion business. As a customer, you can upload your picture to the store’s app, and check how you would look like in various outfits by ‘trying them on’ virtually, without leaving home. Such a solution allows for easier and faster decision-making; some brands are already implementing it in their e-stores.
3D Design and Models
GAN also presents an excellent opportunity for 3D design and modeling. All design processes that are currently performed manually can be automated with generative models. When we combine them with 3D printing, we will be able to build a fully-autonomous system for design and manufacturing of various objects and devices. How will it work? That’s simple. You’ll launch an application, tell the system what output you need, and wait for it to create the desired artifact.