Min-max normalization is one of the most popular ways to normalize data.
For every feature,
- the minimum value of that feature gets transformed into a 0,
- the maximum value gets transformed into a 1,
- and every other value gets transformed into a value between 0 and 1.
It is calculated by the following formula:
v’ is the new value of each entry in data.
v is the old value of each entry in data.
new_max(A), new_min(A) is the max and min value of the range (i.e boundary value of range required) respectively.
Where is the current value of feature F?
Let us consider one example to make the calculation method clear. Assume that the for the feature F ,
minimum value = $50,000
maximum values = $100,000
It needs to range F from 0 to 1.
In accordance with min-max normalization, v = $80,000 is transformed to:
As you can see this technique enables us to interpret the data easily. There are no large numbers, only concise data that do not require further transformation and can be used in decision-making process immediately.
Min-max normalization has one fairly significant downside: it does not handle outliers very well. For example, if you have 99 values between 0 and 40, and one value is 100, then the 99 values will all be transformed to a value between 0 and 0.4.
That data is just as squished as before!
Take a look at the image below to see an example of this.
After normalizing, look at the below diagram it fixed the squishing problem on the y-axis, but the x-axis is still problematic. And the point in orange color is an outlier , which the min-max normalizer doesn’t handle.
You can normalize your dataset using the scikit-learn object MinMaxScaler.
Good practice usage with the MinMaxScaler and other scaling techniques is as follows:
- Fit the scaler using available training data. For normalization, this means the training data will be used to estimate the minimum and maximum observable values. This is done by calling the fit() function.
- Apply the scale to training data. This means you can use the normalized data to train your model. This is done by calling the transform() function.
- Apply the scale to data going forward. This means you can prepare new data in the future on which you want to make predictions.