Single image super-resolution is one application of deep learning that can help overcome the problems created by a lack of resolution in images when creating a computer vision based application.The goal of a single image super-resolution is to scale up small images while keeping a drop in image quality to a minimum. The predominant deep learning architecture used to accomplish this is called a Generative Adversarial Network, or GAN.
To understand how super-resolution can empower your computer vision systems, let’s take a close look at deep learning and single image super-resolution.
Deep learning is a subdiscipline of machine learning. Most deep learning methods make use of neural network architectures, models inspired by the human brain. The “deep” part of deep learning comes from the fact that deep learning models are comprised of many “layers” made out of nodes/neurons, and each of these neurons contains a mathematical function that alters data in some way.
Traditional neural network configurations have only one or two layers, but deep neural networks can have hundreds of layers. The deeper a neural network is, the more complex data patterns it can learn and the more it can accomplish.
There are a variety of different deep neural network architectures. One of the most commonly used deep neural network architectures is called a Convolutional Neural Network or CNN. Convolutional neural networks can process two-dimensional data, making CNN's useful for the handling of image data. CNN has four different critical building blocks: convolutional layers, nonlinear activation functions, and pooling layers. Because CNN's excel at handling image data, they are used in computer vision tasks, including single image super-resolution problems.
Another type of deep learning architecture is a Generative Adversarial Network, and these networks are used in super-resolution problems.
Super resolution is the process of upscaling an image, improving its detail and resolution. Super-resolution is usually done on images with low resolution, where a high-resolution image is needed. The high-resolution output details filled in, even though the details are basically unknown. The network must take an input image in and make guesses about what the possible values of the pixels in the super-resolution image are based only on the pixel information from the image.
Deep neural network based super-resolution is useful primarily because it deals with problems found in other image upscaling techniques. Other image upscaling methods lack fine details in the image and they can’t deal with compression artifacts or other image defects. Deep neural network based super-resolution is also useful because it enables compressed image transfer between networks, where a compressed version of the image can be sent and then the image can be upscaled once received. Models trained for the task of super-resolution can also be used to repair defects in an image, like removing compressions and corrupted pixels.
Super-resolution is often used for tasks like medical image processing, enhancing compressed images and video, and analyzing the content of aerial or satellite images.
The traditional super-resolution makes use of multiple low-resolution images, extracting information from the images and joining those bits of information together to create an upscaled version of an image. Single-Image Super-Resolution is exactly what it sounds like, doing super-resolution on a single image. This proves to be much more difficult than traditional, multi-image super-resolution since there is far less information to work with. One way to solve this problem is by using a kind of deep neural network referred to as a Generative Adversarial Network.
Generative Adversarial Networks are a deep learning architecture constructed by combining two other network architectures together. GANs can generate new instances of a given class, or create images that are just like any kind of image you are interested in. This is accomplished by pitting the two networks in the GAN against each other in a zero-sum competition.
One part of the GAN is a discriminative network, while the other part is a generative network. The generative network will create fake images, images designed to look like images of the target class. These fake/generated images are then passed to the discriminatory network. The discriminatory network’s job is to try and detect the counterfeit images, while the job of the generative network is to create images realistic enough to be classified as genuine by the discriminatory network.
Both of the networks are undergoing training as they are pitted against each other. The discriminatory network gets to see the results of the generator network, so it is adapting to the improving accuracy of the generator network. This, in turn, forces the generator network to improve its generated images. This constant back and forth is what makes both networks improve and leads to more realistic super-resolution images.
Single Image Super Resolution can be done with a GAN. This process for generating super-resolution images involves finding suitable high resolution images to train on. Once these high resolution images are found, copies are made by processing the high resolution images, creating down-sampled, low-resolution images. This means that there are both low resolution and high resolution images in the training set.
After the complete training set is created, the low-resolution images are passed through the generator portion of the GAN, resulting in the upscaled (Super Resolution) images. Finally, the upscaled images are passed to the discriminatory portion of the GAN, which distinguishes the high resolution images, the super-resolution images, and the difference between the two (the loss) is given to the network to train on again.
The optimization target for non-GAN super-resolution algorithms is derived by calculating the mean squared error between the upscaled image and the original/ground truth image.While other deep neural networks frequently only have one loss component, a GAN has two loss components: content loss and adversarial loss. The loss function used by the generative component of the network is a weighted sum of both the adversarial loss and the content loss.
When designing a GAN for the purposes of a single image, it is common to use “skip connections” in-between layers of the generator network. These skip connections skip over calculating the gradients, and thus they help combat the vanishing gradient problem.
In order to simplify the training of the model, a pre-trained architecture like the VGG-19 network can be used as a feature extractor. That way, the network can compare feature values instead of individual pixel values against one another. When using a VGG network as a feature extractor, the VGG network’s loss is the Euclidean distance between the representations of the reconstructed image’s features and the reference image.
Using GANs for single image super-resolution can help you design apps that remove image defects, process medical images, enhance highly compressed images/videos, or do analysis on aerial/satellite images. The effectiveness of GANs can be enhanced by combining them with CNNs, in an architecture referred to as DCGAN.
In addition to setting up your GAN for a single image super-resolution, you should make sure that your images are properly annotated for computer vision and deep learning. There are many useful image annotation tools you can use, or you can outsource your image annotation to trained professionals.