Everything you need to know about Reinforcement Learning

The phrase “Reinforcement Learning” could sound a little intimidating at first, but when we break it down, it’s actually quite simple. Let’s start with the phrase itself. What does the word “reinforce” mean? No! don’t get googling already! I’ll tell you. It simply means to strengthen or support something. So Reinforcement Learning would mean, strengthening or supporting a particular way of learning. Let me elaborate.

Reinforcement Learning is one of the 3 branches of Machine Learning:

The 3 main branches of Machine Learning
The 3 main branches of Machine Learning

In the following sections of the article, I am going to cover everything that is required by a beginner to get started with Reinforcement Learning. So just sit back and enjoy the ride!

A simple definition of Reinforcement Learning

This is a type of machine learning, which involves an agent in an unknown environment and a goal. In the absence of a dataset, the agent learns by getting rewarded for good action and punished for a bad action. 

When an agent performs an action, the environment will return to a state and the agent will get feedback accordingly if his action resulted in a good state or a bad state. Let me give you a quick example.

Consider an agent in an environment. Say there’s a fire, and a fire extinguisher and a wrench are present in the environment.

Initial State of the Environment
Initial State of the Environment – Credits: Canva

The goal state here is to put out the fire in an efficient way. Let’s see how the agent solves the problem using reinforcement learning.

State 1

State 1 of the Environment – Credits: Canva

Initially, the agent has no knowledge of the consequence of its actions. So let’s say the agent approaches the fire. Yes, I know it’s like the worst possible choice, but the agent doesn’t know that. In order to understand that approaching fire is dangerous, the agent has to approach it, get hurt (negative reward), and realize that it’s the wrong thing to do. 

Another parameter to consider is the magnitude of how good or bad a decision is.  Here, approaching the fire is a very bad decision, so it’s punished accordingly with 3 warnings. Now the agent knows it’s a bad decision.

State 2

State 2 of the Environment
State 2 of the Environment – Credits: Canva

Now the agent decides to go toward the wrench. It is not harmed by performing this action but it still counts as a bad decision because the goal is to put out the fire and a wrench will not help in doing so. So it is again punished, but less severely than before, with just 1 warning.

State 3

State 3 of the Environment
State 3 of the Environment – Credits: Canva

Now the only object left is the fire extinguisher which is the right choice. So the agent is rewarded with 2 points. The agent learns that in such a situation, the fire extinguisher is the right decision to make.

State 4

State 4 of the Environment
State 4 of the Environment – Credits: Canva

Now say the agent takes the fire extinguisher and moves toward the wrench. Again, this does no harm to the agent but it still counts as a bad decision. Remember, our goal is to put out the fire efficiently. So the agent is punished with a single warning.

State 5

State 5 of the Environment
State 5 of the Environment – Credits: Canva

The agent finally moves towards the fire with the fire extinguisher and is rewarded with 3 points. This is how reinforcement learning works. Unlike in supervised learning where it is told by the labeled dataset on what action to take, here an agent learns using a trial and error approach.

In order to perform well, it has to fail, learn from its mistakes, and not repeat them. Sounds philosophical right? Some actually believe that an agent is analogous to a baby and the world is analogous to an environment and the process of reinforcement learning is how the baby grows.

Here is one video from Edureka.com on Reinforcement Learning

Reinforcement Learning Tutorial | Reinforcement Learning Example Using Python | Edureka

Now let us understand the differences between the three branches of Machine Learning.

Supervised Learning vs Unsupervised Learning vs Reinforcement Learning

Uses a labeled datasetUses an unlabeled datasetDoes not use a dataset. Learns by interacting with the environment
Requires supervisionDoesn’t require supervisionIt falls in between supervised and unsupervised learning; Learns by an action-feedback mechanism
Uses pre-existing algorithmsUses pre-existing algorithmsThe agent has to start learning from scratch
Used to predict a defined target variableUsed to understand the patterns among data pointsUsed to make sequential decisions
An input value is mapped to a known set of output valuesAn input value is mapped to a set of unknown patterns identifiedThe trial and error method is used to identify the next optimal state
Used for Classification and RegressionUsed for Clustering and Association rule miningUsed for Exploitation and Exploration
Ex: Linear Regression, KNN, Decision Trees, etcEx: K-Means, K-Modes, Apriorim, etcEx: Q-Learning, SARSA, etc
A tabular representation of the differences between the various branches of Machine Learning

If you would like to learn more about Supervised and Unsupervised Learning Methods, refer to our posts on – Supervised Learning and Unsupervised Learning.

Now that we understand the differences between the 3 types of Machine Learning, let us dive a little deeper into Reinforcement Learning. (PS: Don’t worry I’ll keep it as simple as I can)

Here are some significant technical terms that are used in the field of Reinforcement Learning.

  • Agent – This is the entity that learns by interacting with the environment.
  • Environment – The world that the agent can interact with.
  • Action – The gestures that the agent can perform in the environment.
  • State – A discrete condition of the environment.
  • Policy – The mechanism used by the agent to choose the next action based on the current state of the environment.
  • Reward – An immediate positive feedback given to the agent which indicates the correctness of its previous action
  • Value – This is like a long-term reward that is achieved by making a few sacrifices in the short term
  • Action Value – Similar to value, but this parameter takes into account the current Action as well
Diagrammatic representation of an Agent Learning
Diagrammatic representation of an Agent Learning

If you’d like to learn more about these terms, refer to the video from the link section.

Markov Decision Process

Any kind of machine learning technique, including Reinforcement Learning, requires a mathematical background to back up the theoretical intuition. This is where the MDP or Markov Decision Process comes in. 

MDP is used to establish a mathematical framework for making decisions in an environment. It represents the actions, states, and values as functions that can be used to organize a policy of sorts and take decisions accordingly. To learn more about the mathematics behind this, refer to the link section.[2]

Applications of Reinforcement Learning

  1. Natural Language Processing – NLP is a category of machine learning that deals with text and audio data. Reinforcement Learning is heavily used in performing topic summarization, and building chatbots that require mimicking a human by making sequential decisions to reply to a message.
  1. Robotics – Many industries are working on training a robot by a Reinforcement Learning methodology, by allowing the robot to interact with the system and learn.
  1. Healthcare – Dynamic Treatment Regimes or DTRs involve sequential treatments which use Reinforcement Learning to correctly diagnose a patient.
  1. Gaming – Agents are being trained to try and play games like chess and by a trial and error process, they learn by interacting with the environment.
  1. Trading and Marketing – Reinforcement Learning is being applied to the financial domain as well. A system is used to make decisions on budgeting, increase profit margins, and handle marketing campaigns.

Challenges in using Reinforcement Learning

  1. It is a very computationally intensive task compared to the forms of learning since it involves a trial-and-error methodology
  2. If there is sufficient data present, it is efficient to use supervised or unsupervised learning
  3. It is a time taking process to train the agent in an acceptable condition
  4. Reinforcement Learning must only be used when we can afford to make mistakes
  5. It doesn’t work well when the data supplied is multidimensional


Although Reinforcement Learning is less popular than its siblings, it holds a huge potential that can be profitable when used appropriately. I hope this article helped you gain a basic understanding of RL.

If you wish to learn more about RL Algorithms, you can refer to the following links

  • https://www.javatpoint.com/reinforcement-learning
  • https://towardsdatascience.com/introduction-to-reinforcement-learning-markov-decision-process-44c533ebf8da

Similar Posts

Leave a Reply

Your email address will not be published.