## 1. Decimal Scaling

It functions by converting a number to a decimal point.To normalize the data by this technique, we divide each value of the data by the maximum absolute value of data.  The formula below – V’ is the new value after applying the decimal scaling V is the respective value of the attribute Example: Suppose a…

## Data Normalization

Normalization is a data preparation technique that is frequently used in machine learning. Data Normalization is a common practice in machine learning which consists of transforming numeric columns to a common scale. i.e. it transforms multi-scaled data to the same scale.  In machine learning, some feature values differ from others multiple times. The features with…

## II. Multicollinearity – VIF

Multicollinearity is a phenomenon unique to multiple regressions that occurs when two variables that are supposed to be independent in reality have a high amount of correlation and are overlapping in what they measure.  In other words, each variable doesn’t give you entirely new information. To picture what multicollinearity is, let’s start by picturing what it…

## I. Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is an unsupervised technique used in machine learning to reduce the dimensionality of data.  It does so by compressing the feature space by identifying a subspace that captures most of the information in the complete feature matrix. It projects the original feature space into lower dimensionality.  PCA technique is used for those datasets that… ## Dimensionality Reduction

Before jumping into dimensionality reduction, let’s first define what a dimension is. Given a matrix A, the dimension of the matrix is the number of rows by the number of columns.  If A has 3 rows and 5 columns, A would be a 3×5 matrix. “Dimensionality” simply refers to the number of features/variables in your dataset.”… ## Correlation Filter Methods

Besides duplicate features, a dataset can also include correlated features. “Correlation is defined as a measure of the linear relationship between two quantitative variables.” A high correlation is often a useful property—if two variables are highly correlated: We can predict one from the other. Therefore, we generally look for features that are highly correlated with…