Everything about Autoencoders
Autoencoders are a type of neural network used in unsupervised learning. An autoencoder consists of 2 main components: encoder and decoder. The encoder encodes the input data to a lower-dimensional vector and the decoder then reconstructs the input from the vector. So, our output is the same as the input. They are feature selective, which ensures that they can prioritize and learn the important features in the data. This also ensures that the output is an approximate copy of the output, particularly because having the same input and output won’t be useful to us. It is commonly used for dimensionality reduction or as a generative model.

Please note that the hidden layer is also known as the code layer.
Understanding autoencoders intuitively
When we learn about autoencoders the first that comes to mind is: Why do we need a network that just reconstructions its input?
The true essence of an autoencoder lies in its latent space representation. The latent space representation is the representation of important features present in the image. Suppose I show you a model and ask you to draw it. If the model is a tree, you will see and draw the trunk, leaves that are green, roots. These would be the latent attributes of the model. Suppose I show you a house, you will draw a cuboidal structure, with a slanting roof, a door, and windows. These would be the latent attributes of a house that you would reconstruct on paper. You would thus be keeping the important features in mind and ignoring unimportant features like flowers or bird nests that can be present in either of the models. This can be done by a machine using an autoencoder.
Architecture & Training
As you can see above, the autoencoder consists of an encoder that encodes the input and a decoder that reconstructs the input. The number of neurons in the input layer is equal to the output layer. An autoencoder having a bottleneck between the input and the output layers are called under complete.
The training of an autoencoder involves backpropagation of the error so you can train an autoencoder like a normal neural network, by defining the cost function, the optimizer, and the activation function(see our post).
![Output(below) is an approximate copy of input(above) Dataset: Fashion MNIST[1]](https://d2np0n0gxnqrjl.cloudfront.net/wp-content/uploads/2022/03/basic-1024x213.jpg)
Dataset: Fashion MNIST[1]
Normal autoencoders minimize the loss function below,

Where L is a loss function penalizing g(f(x)) for being dissimilar from x, g(h) is the decoder output and h=f(x) is the encoder output
Applications of Autoencoders
Following is an example of an image search. A similar Image search is a kind of search in which we upload or give an image from a dataset and it will output top N similar images from that dataset.
Here the raw input image can be passed to the encoder network and obtained a compressed dimension of encoded data. The autoencoder network weights can be learned by reconstructing the image from the compressed encoding using a decoder network. And then the compressed embedding can be compared or searched with an encoded version of the search image.

Other applications of Autoencoders:
- Dimensionality reduction: Autoencoders are better than PCA(Principal component analysis ) due to due to their non-linearity.
- Information retrieval: Information retrieval can be done more effectively since we transform the data to a low dimensional space
- Anomaly detection: It detects mislabeled data points in a dataset or when an input data point falls well outside our typical data distribution
- Image processing: It is used for denoising of images. Autoencoders are used in more demanding contexts such as medical imaging and super-resolution.
- Recommendation Systems: Deep Autoencoders can be used to understand user preferences to recommend movies, books, or other items.
- Image generation: Variational Autoencoders are used to generate images. The idea is that given input images like images of faces or scenery, the system will generate similar images.
- Sequence to sequence prediction: The LSTMs-based autoencoders that can capture temporal structure can be used to address Machine Translation problems. They are used to predict the next frame of a video or to generate fake videos.
There are various types of autoencoders. We will explore these in the next section.
References
[1]Han Xiao, Kashif Rasul, & Roland Vollgraf (2017). Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. CoRR, abs/1708.07747.
[2]Ian Goodfellow, Yoshua Bengio, & Aaron Courville (2016). Deep Learning. MIT Press.
Further Reading
- https://www.mygreatlearning.com/blog/autoencoder/
- https://www.deeplearningbook.org/contents/autoencoders.html#pf14
- https://blog.keras.io/building-autoencoders-in-keras.html