The term ‘Boosting’ refers to a family of algorithms that converts weak learners to strong learners.
Let’s understand this definition in detail by solving a problem:
Let’s suppose that, given a data set of images containing images of cats and dogs, you were asked to build a model that can classify these images into two separate classes. Like every other person, you will start by identifying the images by using some rules, given below:
- The image has pointy ears: Cat
- The image has cat-shaped eyes: Cat
- The image has bigger limbs: Dog
- The image has sharpened claws: Cat
- The image has a wider mouth structure: Dog
All these rules help us identify whether an image is a Dog or a cat, however, if we were to classify an image based on an individual (single) rule, the prediction would be flawed. Each of these rules, individually, is called a weak learner because these rules are not strong enough to classify an image as a cat or dog.
Therefore, to make sure that our prediction is more accurate, we can combine the prediction from each of these weak learners by using the majority rule or weighted average. This makes a strong learner model.
In the above example, we have defined 5 weak learners and the majority of these rules (i.e. 3 out of 5 learners predict the image as a cat) give us the prediction that the image is a cat. Therefore, our final output is a cat.
‘How boosting identifies weak rules?‘
To find weak rules, we apply base learning (ML) algorithms with a different distribution. Each time a base learning algorithm is applied, it generates a new weak prediction rule. This is an iterative process. After many iterations, the boosting algorithm combines these weak rules into a single strong prediction rule.
The boosting technique follows a sequential order. Firstly, a model is built from the training data. The output of one base learner will be input to another. Then the second model is built which tries to correct the errors present in the first model i.e. If a base classifier is misclassified (red box), its weight will get increased (over-weighting) and the next base learner will classify more correctly.
This procedure is continued and models are added until either the complete training data set is predicted correctly or the maximum number of models is added.
Boosting can take several forms, including:
- Gradient Boosting