II. Unsupervised Learning – Clustering
Clustering:
A Hospital Care chain wants to open a series of Emergency-Care wards within a region. We assume that the hospital knows the location of all the maximum accident-prone areas in the region. They have to decide the number of Emergency Units to be opened and the location of these Emergency Units, so that all the accident-prone areas are covered in the vicinity of these Emergency Units.
The challenge is to decide the location of these Emergency Units so that the whole region is covered. Here is when Clustering comes to the rescue!
A cluster refers to a small group of objects. Clustering is grouping those objects into clusters. In order to learn to cluster, it is important to understand the scenarios that lead to the clustering of different objects. Let us identify a few of them.
What is Clustering?
Clustering is dividing data points into homogeneous classes or clusters:
- Points in the same group are as similar as possible
- Points in different group are as dissimilar as possible
K-means clustering is the most used clustering algorithm.
1. K-means Clustering
- A centroid-based algorithm and a very simple unattended learning algorithm.
- This algorithm attempts to reduce the variation in data points within a collection. It is a way for many people to be informed about unsupervised machine learning.
- K means methods are best used in small data sets because they run over all data points. That means it will take a lot of time to separate data points if there is a large number of them in the data set.
Our algorithm works as follows, assuming we have inputs x1,x2,x3,…,xn and value of K
- Step 1 – Pick K random points as cluster centers called centroid.
- Step 2 – Assign each xi to nearest cluster by calculating its distance to each centroid.
- Step 3 – Find new cluster center by taking the average of the assigned points.
- Step 4 – Repeat Step 2 and 3 until none of the cluster assignments change.
Implementation of K-means clustering with an example:
STEP 1 – INITIALIZATION
STEP 1/2 – FINDING THE NEAREST CENTROID FOR EVERY ELEMENT
For every element calculating its distance from the center (Euclidean Distance)
STEP 2/2 – ASSIGNING ELEMENTS TO ANY OF THE CLUSTERS:
Thus we obtain 2 clusters containing {1,2,3} and {4,5,6,7}.
New centroids are:
STEP 3- ASSIGNING ELEMENTS TO NEW CLUSTERS ACCORDING TO DISTANCE

STEP 4:
Thus, the algorithm comes to a halt here and the final result consists of 2 clusters {1, 2} and {3, 4, 5, 6, 7}.
HOW TO CHOOSE K?
2. Elbow method
· Use another clustering method, like EM.
· Run an algorithm on data with several different values of K.
· Use the prior knowledge about the characteristics of the problem.
K-Means clustering can be done using the Scikit-learn package:
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
Thanks for reading!!
You can also go ahead and read our previous posts on Supervised Learning Algorithms. Click here.