Support Vector Machines are a set of supervised learning methods used for classification, regression, and outlier detection.
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyper-plane.
The idea of SVM is simple: It takes the past data as an input and outputs a line or a hyper-plane which separates the mixed data into classes.
For example – Firstly, you train the machine to recognize what apple strawberries look like.
Now we have pre-labeled data of apples and strawberries, we can easily train our model to identify apples and strawberries. So, whenever we give it new data – an unknown one – it can classify it under strawberries or apples.
That’s SVM in play.
It analyses the data and classifies it into one of the two categories based on the labeled data it already has. It will sort the apples under the apple category and the strawberries under the strawberry category.
How Does SVM Work?
Before delving deep into the working of SVM, let us quickly understand the following terms.
- Hyperplane – As we can see in the above diagram, it is a decision plane or boundaries which are divided between a set of objects having different classes. The dimension of the hyperplane depends upon the number of features. If the number of input features is 2, then the hyperplane is just a line. If the number of input features is 3, then the hyperplane becomes a two-dimensional plane.
- Support Vectors – Support Vectors are the data points that are on or nearest to the hyperplane and influence the position of the hyperplane. The vector points closest to the hyperplane are known as the support vector points because only these two points are contributing to the result of the algorithm, other points are not. If a data point is not a support vector, removing it has no effect on the model. On the other hand, deleting the support vectors will then change the position of the hyperplane.
- Margin – Margin is the gap between the hyperplane and the support vectors. The marginal lines are separated away from each other until it touches class features. Once it touches the features, a line along the middle of the marginal lines is drawn.
- Kernel function – These are the functions used to determine the shape of the hyperplane and decision boundary.
Let us understand the working of SVM by taking an example where we have two classes that are shown in the below image which are a
class A: Circle & class B: Triangle.
Now, we want to apply the SVM algorithm and find out the best hyperplane that divides both classes.
1. Linearly Separable Data:
SVM takes all the data points into consideration and gives out a line that is called ‘Hyper-plane’ which divides both the classes. This line is termed as ‘Decision boundary’.
Anything that falls in circle class will belong to class A and vice-versa.
SVM’s way to find the best line/hyperplane:
According to the SVM algorithm, we find the points closest to the line from both classes. Now, we compute the distance between the line and the support vectors. Our goal is to maximize the margin. The hyperplane for which the margin is maximum is the optimal hyperplane.
Any hyperplane can be written as the set of points x satisfying:
1. Linear SVM – Hard Margin Classifier:
If the training data is linearly separable, we can select two parallel hyperplanes that separate the two classes of data, so that the distance between them is as large as possible.
Note: In order to find the maximal margin, we need to maximize the margin between the data points and the hyperplane.
The distance is these two hyperplanes is computed using the distance from a point to a plane equation. We also have to prevent data points from falling into the margin. These constraints state that each data point must lie on the correct side of the margin.
2. Linear SVM – Soft Margin Classifier:
As most of the real-world data are not fully linearly separable, we allow SVM to make a certain number of mistakes and keep the margin as wide as possible so that other points can still be classified correctly.
It is better to have a large margin, even though some constraints are violated. Margin violation means choosing a hyperplane, which can allow some data points to stay on either the incorrect side of the hyperplane and between the margin and correct side of the hyperplane.
2. Linearly Non-separable Data:
Not all data are linearly separable. In fact, in the real world, almost all the data are randomly distributed, which makes it hard to separate different classes linearly. SVM can also conduct non-linear classification. For the above dataset, which is non-linearly separable it is obvious that it is not possible to draw a linear margin to divide the data sets.
This data can be converted into a linear one using higher dimensions.
Let’s create one more dimension and name it z.
Hold on!! Let’s see what it will look like.
How to calculate dimensions for z?
Well, it can be done by using the following equation: Z = x2+y2 – equation(1)
By adding this dimension, we will get three-dimensional space. Now you can see that the data has become linearly separable. As we are in three dimensions now, the hyperplane we got is parallel to the x-axis at a particular value of z(say d). d = x2+y2 (from equation 1)
We can see that it is the equation of a circle. Hence, we can convert our linear separator in higher dimensions back to the original dimensions using this equation.
Yayy, here we go. Our decision boundary or hyperplane is a circle, which separates both classes efficiently.
In the SVM classifier, it’s easy to make a linear hyperplane between these two classes.
But, another curious question that arises is, Do we have to implement this feature by ourselves to make a hyperplane? The answer is, no, The SVM algorithm takes care of that by using a technique called the kernel trick. Please refer to our post on Kernel Function here.
To know more:
 To implement the SVM Classifier please refer to:
 A guide to implementing the basics of SVM using Python: refer to an article from analyticsdimag: