Ridge Regression is a Linear Regression model use to solve some of the problems of Ordinary Least Squares by imposing penalty on regression coefficients.
What is Ridge Regression?
We have seen Ordinary Least Squares. Suppose we have independent variable X and dependent variable Y. we can write
and our Objective Function is
Suppose we have dataset where independent features are highly correlated to each other in that case X1, X2 also changes. If we use OLS it will estimate beta with high standard error. due to high standard errors confidence interval of regression coefficients will high. it means we are estimating beta with low precision. if we test this model on unseen data it will produce high error. because our estimated betas are not précised.
In this case we can not use Ordinary Least squares. in this case we need to find some method to reduce these regression coefficients(slopes). Here Ridge Regression plays important role. Ridge regression is used to analyze any data that suffers from multicollinearity. Because it penalizes regression coefficients by adding some bias lamdba. increasing bias leads to small values of betas(slopes).
To penalize Regression coefficients Ridge regression performs L2 Regularization. Let’s see how our Equation looks like
We can notice the extra term. this term is L2 regularization term. where lambda is a tuning parameter. We need to minimize this function in order to get the regression coefficients. If you haven’t finished Gradient descent i would recommend you to go through gradient descent blogs before. Let’s minimize this function
As We can see a positive lambda term in the denominator. if lambda is 0 then there is Penalty but if lambda is very high then it regression coefficients will have very small values.
If you haven’t notice we are changing beta here. it means we are trying to find better curve. very high lambda leads to small coefficients and it will lead to underfit model. so it is important to tune this parameter. Here we have seen how ridge regression helps in multicollinearity.
It also helps in overfitting problem.
If we train our model for too long then model starts to learn noise in the dataset. We can see in the figure model is influenced by noise. If we try to calculate the slope of the curve then we notice high values of slopes. from data point 1-3 we can see the changes in slope and it is very high. Therefore we can use regularization to reduce regression coefficient to get smooth curve.
Before coding Ridge regression. We first need to understand the assumptions :
It follows all assumptions of Linear Regression
- Linear relationship between regression coefficients and dependent variable
- Mean of errors should be zero
- Constant variance of error(Heteroscedasticity)
- No correlation between errors(autocorrelation)
- No correlation between error and dependent variable
- Dependent variable should not have 0 variance(0 variance means all dependent variables are same)
- Normality of Residuals(Optional)
Let’s code Ridge regression
here is code for both Ridge and Linear Regression First i use grid search to find the best parameter(alpha) then i trained my ridge model.
Here is the code snippets for both
Let’s check the regression coefficients of both models
For ridge regression
For Linear regression
We can notice the difference between the coefficients and intercept.
We can easily conclude that if we are suffering from a problem where we have high regression coefficients(overfitting) or we have high standard error of regression coefficients(multicollinearity) we can use ridge regression to penalize regression coefficients by doing this we will be able to reduce coefficients and reduce the standard error.