Steps of Machine learning

Different stages in machine learning model:

Please refer to our section on machine learning models.


1. Gathering Data

First thing first. You need to understand the Business Problem you are facing. You have to consider what the main goals of your problem are.

Data is power. When the problem is clear, and an appropriate machine learning approach is established, it’s time to collect data. This step is data-centric: determine how much data is needed, what type of data is needed, where to get the data… and get the data. 

As you know, machines initially learn from the data that you give them. It is of the utmost importance to collect reliable data so that your machine learning model can find the correct patterns. 

As an example, you were given the task as a Machine learning engineer to make predictions automatically to determine oranges or apples.


2. Preparing Data

Collected data is messy. There are many problems that machine learning engineers face when dealing with raw data. After you have your data, you have to prepare it. You can do this by putting together all the data you have and randomizing it. 

This step takes two important things to do, which are: Data Cleaning and Data Transformation

a. Data cleaning:

  • Relevant data should be filtered. Irrelevant data should be cleaned up;
  • Noise, delusive and erroneous samples should be identified and removed;
  • Outliers should be recognized and eliminated;
  • Missing values should be spotted and either removed and imputed by proper methods;
  • Data should be converted to proper formats.

b. Data Transformation:

It is a process of modifying the data based on predefined rules like splitting the data into 3 different parts which are: data for training, validation, and testing.

You can split them into:

  • 10% for the tests,
  • 10 % for the validation during training,
  • 80% for training.
Data Transformation: Train, Test, and validate

3. Choosing a Model:

A machine learning model determines the output you get after running a machine learning algorithm on the collected data. It is important to choose a model which is relevant to the task at hand. 

Over the years, scientists and engineers developed various models suited for different tasks some are very well suited for image data, others for sequences (like text, or music), some for numerical data, others for text-based data.

4. Training Data:

Training is the most important step in machine learning. In training, you pass the prepared data to your machine learning model to find patterns and make predictions. It results in the model learning from the data so that it can accomplish the task set. Over time, with training, the model gets better at predicting. 

In this step, we will use our data to incrementally improve our model’s ability to predict whether a given fruit is apple or orange.


Where, x is the input, m is the slope of that line, b is the y-intercept, and y is the value of the line at the position x. 

The values we have available to us for adjusting, or “training”, are m and b. There is no other way to affect the position of the line since the only other variables are x, our input, and y, our output.

In machine learning, there are many m’s since there may be many features. The collection of these m values is usually formed into a matrix, that we will denote W, for the “weights” matrix. Similarly, for b, we arrange them together and call that the biases.

Training : Update weights and biases on each Iteration

The training process involves initializing some random values for W and b and attempting to predict the output with those values. We can compare our model’s predictions with the output that it should produce, and adjust the values in W and b such that we will have more correct predictions.

This process then repeats. Each iteration or cycle of updating the weights and biases is called one training “step”.

5. Evaluation:

After training your model, you have to check to see how it’s performing. This is done by testing the performance of the model on previously unseen data. The unseen data used is the testing set that you split our data into earlier. If testing was done on the same data which is used for training, you will not get an accurate measure, as the model is already used to the data, and finds the same patterns in it, as it previously did. This will give you disproportionately high accuracy. 

When used on testing data, you get an accurate measure of how your model will perform and its speed.

Evaluation Metrics for Classification and Regression Models

6. Parameter Tuning

Once you have created and evaluated your model, see if its accuracy can be improved in any way. This is done by tuning the parameters present in your model. Parameters are the variables in the model that the programmer generally decides. 

At a particular value of your parameter, the accuracy will be the maximum. Parameter tuning refers to finding these values. There were a few parameters we implicitly assumed when we did our training, and now is a good time to go back and test those assumptions and try other values.

Model Parameters : Model Design + Hyperparameters

These parameters are typically referred to as “hyperparameters”. The adjustment, or tuning, of these hyperparameters, remains a bit of an art and is more of an experimental process that heavily depends on the specifics of your dataset, model, and training process.

They are very much crucial for the successful production of the machine learning model.

7. Prediction:

Machine learning is using data to answer questions. So, Prediction, or inference, is the step where we get to answer some questions. This is the point of all this work, where the value of machine learning is realized.


In the end, you can use your model on unseen data to make predictions accurately.

To implement all these 7 steps of machine learning please check this article:

Please check out the notebook on Github for the source code.

Similar Posts

Leave a Reply

Your email address will not be published.