| |

Why data normalization is important for non-linear classifiers

The term “normalization” usually refers to the terms standardization and scaling. While standardization typically aims to rescale the data to have a mean of 0 and a standard deviation of 1, scaling focuses on changing the range of the values of the dataset.

As mentioned in [1] and in many other articles, data normalization is required when the features have different ranges. It is important to normalize weight and height features because their ranges of values have different scales: e.g. [*45-130 Kg] for weight and [*120-230 cm].

However, many articles do not mention other applications in which data normalization is also crucial. Usually, in those applications, data normalization may not be so obvious or easy to implement, due to the similar scales of all the features or the relative information of each feature. Time series in the frequency domain are an example because, firstly, the scale of the amplitude of the frequencies is, in principle, the same, and, secondly, normalizing per frequency (feature) may cause a loss of relative information.

So, under what circumstances is data normalization necessary? To answer this question, This article displays an illustrative example that compares the performance of a linear classifier (SVM classifier with the linear kernel) and a non-linear classifier (SVM classifier with RBF kernel) before and after implementing data normalization.

As will be observed, results indicate that implementing data normalization does not influence the accuracy of the linear classifier while it affects the accuracy of the non-linear classifier drastically.

Similar Posts

Leave a Reply

Your email address will not be published.