Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique.
It is a classification algorithm. It is used to predict a binary outcome based on a set of independent variables that translates the input to 0 or 1.
Ok, so what does this mean?
A binary outcome is one where there are only two possible scenarios— either the event happens (1: positive class) or it does not happen (0: negative class).
So, You know you’re dealing with binary data when the output or dependent variable is dichotomous or categorical in nature; in other words, if it fits into one of two categories (such as “yes” or “no”, “pass” or “fail”, “true” or “false”, and so on).
How does Logistic Regression work?
Logistic Regression uses a more complex cost function than Linear Regression, this cost function is called the ‘Sigmoid function’ also known as the ‘logistic function’ instead of a linear function.
The hypothesis of logistic regression tends to limit the cost function between 0 and 1.
We use the Sigmoid function/curve to predict the categorical value. A Sigmoid function maps any real value into another value between 0 and 1. In machine learning, we use sigmoid to map predictions to probabilities. The threshold value decides the outcome (win/lose).
where, f(x) = output between 0 and 1 (probability estimate) | x = input to the function
Consider the below image:
Decision Boundary: The prediction function returns a probability score between 0 and 1. If you want to map the discrete class (true/false, yes/no), you will have to select a threshold value above which you will be classifying values into class 1 and below the threshold value into class 2.
p ≥ 0.5, class=1 p < 0.5, class=0
For example, consider the above diagram, suppose the threshold value is 0.5 and your prediction function returns 0.8, it will be classified as positive. If your predicted value is 0.2, which is less than the threshold value, it will be classified as negative. For logistic regression with multiple classes, we could select the class with the highest predicted probability.
Types of Logistic Regression:
1. Binary Logistic Regression:
The dependent variable has only two 2 possible outcomes/classes.
Example-Male or Female, spam or not spam
2. Multinomial Logistic Regression
The dependent variable has only two 3 or more possible outcomes/classes without ordering. Example: Predicting food quality (Good, Great, and Bad).
3. Ordinal Logistic Regression
The dependent variable has only two 3 or more possible outcomes/classes with ordering.
Example: Star rating from 1 to 5
Classification problems are one of the world’s most widely researched areas. Use cases are present in almost all production and industrial environments. Speech recognition, face recognition, text classification – the list is endless.
Classification models have discrete output, so we need a metric that compares discrete classes in some form. Classification Metrics evaluate a model’s performance and tell you how good or bad the classification is, but each of them evaluates it in a different way.
So in order to evaluate Classification models, we’ll discuss these metrics in detail:
- Confusion Matrix (not a metric but fundamental to others)
- Precision and Recall
You can refer to our articles on the above-mentioned error metrics. Refer to the below links:
- Click here to read more about Confusion Matrix.
- To know about accuracy,precision recall, F1-Score click here.
- Click here to read about AU-ROC Curve.
You can also refer to this blog from intellipaat to get more insights on the Implementation of Logistic Regression: https://intellipaat.com/blog/what-is-logistic-regression/