- Classifiers generally don’t return a simple “Yes-or-No” answer.
- Mostly, a classification procedure will return a score along a range as shown in below example:
We know that logistic regression gives us the result in the form of probability. Say, we are building a logistic regression model to detect whether breast cancer is malignant or benign. A model that returns a probability of 0.8 for a particular patient, means the patient is more likely to have malignant breast cancer. On the other hand, another patient with a prediction score of 0.2 on that same logistic regression model is very likely not to have malignant breast cancer.
Then, what about a patient with a prediction score of 0.6? In this scenario, we must define a classification threshold to map the logistic regression values into binary categories. For instance, all values above that threshold would indicate ‘malignant’ and values below that threshold would indicate ‘benign.’
By default, the logistic regression model assumes the classification threshold to be 0.5, but thresholds are completely problem-dependent. In order to achieve the desired output, we can tune the threshold.
But now the question is how do we tune the threshold? How do we know which threshold would give us a more accurate logistic regression model? So, for that, we will be using the ROC curve and the Area Under ROC Curve (AUC). It’s a tool that helps to set the best threshold.
I assume that you already know about the terminologies of Confusion Matrices; And for the newbies, please refer to our earlier posts of Model Selection on Confusion Matrix and Precision, Recall to better understand the concept of the ROC-AUC Curve.
Let us go ahead and understand what the ROC curve is and how we use that in machine learning.
A. ROC Curve:
ROC stands for Receiver Operating Characteristic. The ROC curve is a graphical plot. Its purpose is to illustrate our classification model’s ability to distinguish between classes at various thresholds.
It is a visualization graph that is used to evaluate the performance of different machine learning models. This graph is plotted between true positive and false positive rates where true positive is totally positive and false positive is a total negative.
True Positive Rate: True Positive Rate is the proportion of observations that are correctly predicted to be positive. It is a synonym for recall.
False Positive Rate: False Positive Rate is the proportion of observations that are incorrectly predicted to be positive.
For different threshold values, we will get different TPR and FPR. So, in order to visualize which threshold is best suited for the classifier, we plot the ROC curve.
The following figure shows what a typical ROC curve looks like.
Alright, now that we know the basics of the ROC curve let us see how it helps us measure the performance of a classifier.
The ROC curve of a random classifier with the random performance level (as shown above on the right-hand side) always shows a straight line. This random classifier ROC curve is considered to be the baseline for measuring the performance of a classifier.
Two areas separated by this ROC curve indicate an estimation of the performance level—good or poor.
B. Area Under ROC Curve
AUC is the acronym for the Area Under Curve. It is the summary of the ROC curve that tells about how good a model is when we talk about its ability to generalize.
Greater the area under this curve (AUC), the greater the model’s ability to separate the responses (e.g., Spam and Not Spam).
The below figures represent the nature of the curve for different AUC values. Notice how the curve gets flattered with decreasing AUC. In machine learning approaches, our aim revolves around increasing this AUC as much as possible. It is done by choosing a model that maximizes the AUC of this ROC curve.
The ideal curve in the above-left image fills in 100%, which means that you’re going to be able to distinguish between negative results and positive results 100% of the time (which is almost impossible in real life). The further you go to the right, the worse the detection. The ROC curve to the far right does a worse job than chance, mixing up the negatives and positives (which means you likely have an error in your setup).
Please refer to this short YT video from StatQuest: