# 3. Cost-sensitive learning

It is another commonly used method to handle classification problems with imbalanced data. It’s an interesting method. In simple words, this method evaluates the cost associated with misclassifying observations.

Recent research has shown that cost-sensitive learning has many times outperformed sampling methods. Therefore, this method provides a likely alternative to sampling methods.

Traditionally, machine learning algorithms are trained on a dataset and seek to minimize error. Fitting a model on data solves an optimization problem where we explicitly seek to minimize error. A range of functions can be used to calculate the error of a model on training data, and the more general term is referred to as loss. We seek to minimize the loss of a model on the training data, which is the same as talking about error minimization.

**Error Minimization**: The conventional goal when training a machine learning algorithm is to minimize the error of the model on a training dataset.

In cost-sensitive learning, a penalty is associated with an incorrect prediction and is referred to as a “*cost*.” We could alternately refer to the inverse of the penalty as the “*benefit*“, although this framing is rarely used.

**Cost**: The penalty associated with an incorrect prediction.

The goal of cost-sensitive learning is to minimize the cost of a model on the training dataset, where it is assumed that different types of prediction errors have different and known associated costs.Cost-sensitive learning methods target the problem of imbalanced learning by using different cost matrices that describe the costs for misclassifying any particular data example.

**Cost Minimization**: The goal of cost-sensitive learning is to minimize the cost of a model on a training dataset.

**Let’s understand it using an interesting example:** A data set of passengers is given. We are interested to know if a person has a bomb. The data set contains all the necessary information. A person carrying a bomb is labeled as a positive class. And, a person not carrying a bomb is labeled as a negative class. The problem is to identify which class a person belongs to. Now, understand the cost matrix.

There is no cost associated with identifying a person with a bomb as positive and a person without negative. Right? But, the cost associated with identifying a person with a bomb as negative (False Negative) is much more dangerous than identifying a person without a bomb as positive (False Positive).

**The Cost Matrix** is similar to the confusion matrix. It’s just, we are here more concerned about false positives and false negatives (shown below). There is no cost penalty associated with True Positive and True Negatives as they are correctly identified.

Cost Matrix – The goal of this method is to choose a classifier with the lower-dimensional lowest total cost.

**Total Cost = C(FN)xFN + C(FP)xFP**

where,

- FN is the number of positive observations wrongly predicted
- FP is the number of negative examples wrongly predicted
- C(FN) and C(FP) correspond to the costs associated with False Negative and False Positive respectively. Remember, C(FN) > C(FP).