Types of Autoencoders

Autoencoders are a type of neural network used in unsupervised learning. They encode the input data to a lower-dimensional vector and attempt to reconstruct the input from the vector. So, our output is the same as the input. They are feature selective, which ensures that they can prioritize and learn the important features in the data. mklj

You can read more about them in this post.

There are different types of autoencoders. Let us learn about a few commonly used autoencoders – Regularized Autoencoders and Variational Autoencoders

Regularized Autoencoders

We know that we cannot allow the autoencoders to directly copy the input, hence we reduce the sizes of the encoder, decoder, and hidden layer. But this limits the model’s capacity. To prevent this, a loss function is introduced that prevents the model from copying the input. The model is encouraged to learn the following properties[2]:

  • the sparsity of the representation
  • the smallness of the derivative of the representation
  • robustness to noise or to missing inputs

Keeping these properties in mind, we get 3 types of autoencoders- Sparse, Denoising, and Contractive

Sparse Autoencoders

We know how the regular autoencoders work. What is different about sparse autoencoders is that they have a sparsity constraint on the hidden units. This constraint is a penalty that is applied to the neurons to achieve a bottleneck. The penalty ensures that only a small number of neurons are activated(i.e. it directly affects the activations of the neurons), this forces the model to learn the unique statistical features of the data. You can think of the penalty as a regularizer, the only difference is that a regularizer affects the weights of a neuron while the penalty affects the activations of a neuron.

The formula given below describes the average activation of a neuron in the hidden layer.

Average activation
Average activation

Here, aj(2) (x) is the activation of hidden neuron j in layer 2.

We enforce the following constraint where ρ is the sparsity parameter. The value of ρ is close to zero.

Sparsity Constraint
Sparsity Constraint

This sparsity penalty can be imposed using L1 regularization(see our post) or KL Divergence

A sparse autoencoder network
A sparse autoencoder network (Source:[4])

L1 Regularization

We apply the L1 regularization on the activation by adding a scaled regularization term to the loss function. Mathematically, it is expressed as follows

Loss term for Sparse Autoencoder
Loss term for Sparse Autoencoder

Kullback-Leibler Divergence(KL Divergence)

The KL divergence tells us the difference between two different distributions. It is expressed as follows,

KL Divergence
KL Divergence

We try to minimize this term so that,

Sparsity Constraint
Sparsity Constraint

The cost term is defined as,

Loss term for Sparse Autoencoder with KL Divergence
Loss term for Sparse Autoencoder with KL Divergence

Denoising Autoencoders

In the sparse autoencoders, we add a penalty term to the cost function. In denoising autoencoders, we try to minimize the reconstruction error term. In a more simple way, normal autoencoders try to reconstruct the input image as the output. In a denoising one, it tries to reconstruct the output from a corrupted or noisy input image. This noise is added randomly to the input images.

The flow of Denoising Autoencoder
The flow of Denoising Autoencoder(Tshirt image: [1])

Please remember that noise is only added during the training.

A denoising autoencoder network(Source:[5])
A denoising autoencoder network(Source:[5])

Contractive Autoencoders

So far you know what denoising autoencoders are. So, by comparing those two, contractive autoencoders can be explained as follows:

Denoising autoencoders make the reconstruction function resist small but finite-sized perturbations of the input, while contractive autoencoders make the feature extraction function resist infinitesimal perturbations of the input

Deep Learning. MIT Press

The contraction autoencoder adds a penalty term in the loss function. This sensitivity penalization term is the sum of squares of all partial derivatives of the extracted features with respect to input dimensions[3]. Mathematically, it is expressed as below,

Penalty term of Contractive Autoencoder
Penalty term of Contractive Autoencoder

The loss is then calculated as,

Loss term for Contractive Autoencoder
Loss term for Contractive Autoencoder

The main idea of contractive autoencoders is to make autoencoders robust to small perturbations(or disturbances) around the training points. Contractive autoencoders are better at feature extraction than denoising autoencoders[3].

A contractive autoencoder network
A contractive autoencoder network(Source: Medium)

Variational Autoencoders

A variational autoencoder(VAE) describes the attributes of an image in a probabilistic manner. You can observe the difference in the description of attributes in the pictures below. A regular autoencoder describes an attribute as a value while a VAE describes the attribute as a combination of latent vectors μ (mean) and σ (standard deviation).

 VAEs are effective in other domains of machine learning. They are used to draw images, achieve optimal results in semi-supervised learning supervised learning, as well as interpolate between sentences.

A variational autoencoder network
A variational autoencoder network(Source: Wiki)


[1]Han Xiao, Kashif Rasul, & Roland Vollgraf (2017). Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. CoRR, abs/1708.07747.
[2]Ian Goodfellow, Yoshua Bengio, & Aaron Courville (2016). Deep Learning. MIT Press.
[3]Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio. 2011. Contractive auto-encoders: explicit invariance during feature extraction. In Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML’11). Omnipress, Madison, WI, USA, 833–840.
[4]Dutta, Sanghamitra & Bai, Ziqian & Jeong, Haewon & Low, Tze & Grover, Pulkit. (2018). A Unified Coded Deep Neural Network Training Strategy based on Generalized PolyDot codes. 1585-1589. 10.1109/ISIT.2018.8437852.
[5]Kumar, Varun & Nandi, G. & Kala, Rahul. (2014). Static hand gesture recognition using stacked Denoising Sparse Autoencoders. 2014 7th International Conference on Contemporary Computing, IC3 2014. 99-104. 10.1109/ic3.2014.6897155.

Further Reading

  1. https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf
  2. https://www.deeplearningbook.org/contents/autoencoders.html#pf14
  3. https://blog.keras.io/building-autoencoders-in-keras.html
  4. https://www.jeremyjordan.me/variational-autoencoders/

Similar Posts

Leave a Reply

Your email address will not be published.