1.Decision Tree
“A graphical representation of all possible solutions to a decision based on a certain condition”.
A decision tree is one of the supervised machine learning algorithms. This algorithm can be used for regression and classification problems — yet, is mostly used for classification problems. A decision tree follows a set of if-else conditions to visualize the data and classify it according to the conditions.
It is a classifier in the form of a tree structure used to determine a course of action.
Before learning more about decision trees let’s get familiar with some of the terminologies:
- Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further gets divided into two or more homogeneous sets.
- Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf node
- Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given conditions. It breaks down a data set into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed.
- Branch/Sub Tree: A tree formed by splitting the tree. Each branch of the tree represents possible decisions.
- Pruning: Pruning is the process of removing the unwanted branches from the tree.
- Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes.
Example:

Now you must be thinking how do I know what should be the root node? What should be the decision node? When should I stop splitting? To decide this, there is a principled criterion.
Principled Criterion
1.Entropy:
Entropy is an information theory metric that measures the impurity or uncertainty present in our data. It determines how a decision tree chooses to split data. The image below gives a better description of the purity of a set.
We can calculate entropy using following formula:
Where, i = feature and P = Probability of i
Where, P (value i) is the probability of getting the ith value when randomly selecting one from the set.
For example, let’s take the following image with green and red circles.
In this group, we have 14 circles, out of which 10 are green (10/14) and 4 are red (4/14). Let’s find the entropy of this group.
The entropy of a group in which all examples belong to the same class will always be 0 as shown below at left.
The entropy of a group with 50% in either class will always be 1 as shown above at right.
2. Information Gain
Information gain (IG) measures how much “information” a feature gives us about the class. It tells us how important a given attribute of the feature vectors is. Information gain (IG) is used to decide the ordering of attributes in the nodes of a decision tree.
The main node is referred to as the parent node, whereas sub-nodes are known as child nodes. We can use information gain to determine how good the splitting of nodes in a decision tree. It can help us determine the quality of splitting, as we shall soon see.
The calculation of information gain should help us understand this concept better.
Gain=Eparent − Echildren
Let’s look at an example to demonstrate how to calculate Information Gain. Let’s say a set of 30 people both Male and female, are split according to their age. Each person’s age is compared to 30 and they are separated into 2 child groups as shown in the image and their corresponding node’s entropy is calculated. The main node is called the Parent node and the 2 sub nodes are called child nodes.

The entropies of parent and child nodes are calculated as shown below. The Information gain is then calculated using the entropy of individual nodes.

The steps that needs to be followed to construct a decision tree using Information gain is shown below:
Step 1: Choose the attribute ‘A’ with the highest information gain from the set as the root node.
Step 2: Construct child nodes for each set of A.
Step 3: Repeat recursively until the whole tree is built.
Entropy and Information Gain are two main concepts that are used when it comes to constructing a decision tree, to determine the nodes and the best way to split.
Disadvantages of Decision Tree:

References:
1] https://www.section.io/engineering-education/entropy-information-gain-machine-learning/