Hidden Markov Models Algorithms

A hidden Markov model (HMM) is one in which you observe a sequence of emissions, but do not know the sequence of states the model went through to generate the emissions. We analyze the hidden Markov models to recover the sequence of states from the observed data.

You can read more about it from our earlier post here.

You know what Hidden Markov Models are, now let’s try to understand in depth about them

Basic Structure of HMM:

As we have discussed earlier, Hidden Markov Model has with following parameters :

  • Set of M Hidden States
  • A Transaction Probability Matrix (A)
  • A sequence of t observations
  • A Emission Probability Matrix (Also known as Observation Likelihood) (B)
  • An Initial Probability Distribution

Central Issues with HMM


The first issue is to find the probability of an observed sequence.- Forward Algorithm


The second issue is to find the most probable sequence of hidden states- Viterbi Algorithm, Forward-Backward Algorithm


The third issue is to find parameters that maximize the likelihood of observed data- Baum-Welch Algorithm

Assumptions about HMM

Markov Assumption

Since the HMM is an augmented form of the Markov Model, the following assumption holds true:

The future state is going to be dependent only on the current state. It is expressed as follows:

P(qi = a|q1…qi−1) = P(qi = a|qi−1)
Markov assumption

Output independance

The probability of an output observation(oi) depends only on the state that produced the observation qi and not on any other states or any other observations. It is expressed as follows:

P(oi|q1...qi.......qT,o1...oi......oT)= P(oi|qi)
Output Independance

Hidden Markov Models Algorithms

Forward Algorithm

Let us have a simple HMM, we have two hidden states, they represent the weather in a town, we also know a kid who can carry either of 2 things, a hat and an umbrella depending on the weather. The relationship between the kid’s items and the weather is shown by the orange and blue arrows. The black arrows represent the transition of states.

HMM Model
A HMM Model

Suppose we know the following sequence of items

A sample sequence
A Sequence

We use the forward algorithm to find the probability of the observed sequence given the parameters of HMM are known which are the transition matrix, emission matrix, and stationary distribution(π).

A sequence with potential hidden states
A sequence with potential hidden states

We find all the possible hidden states that may lead us to this sequence. We find the cumulative probability of each sequence. We find that in total there are 8 such sequences. Now calculating the probabilities of each sequence is going to be a tedious task. Computing the joint probabilities directly would require marginalizing over all possible state sequences, the number of which grows exponentially with the increase in length of the sequence. Instead, the forward algorithm takes advantage of the conditional independence rules of the hidden Markov model to perform the calculation recursively.

P(o1,o2..oT)=Sigma P(o1,o2..oT,os)
Probability of a sequence

The probability of a sequence is a sum of all the possible sequences of the hidden states(s), which can be represented as above.

Recursively, it is expressed as follows, with t being the length of the sequence and s representing the hidden state.

Recursive expression of the probability
Recursive expression of the probability

For example, if t=1,


Now, to get the final answer we find the αt for every hidden state and sum it. It is represented as follows:

Final expression for the forward algorithm
The final expression for the forward algorithm

In Forward Algorithm, we use the computed probability on the current time step to derive the probability of the next time step. Hence it is computationally more efficient.

The forward algorithm is mostly used in applications that need us to determine the probability of being in a specific state when we know about the sequence of observations. We first calculate the probabilities over the states computed for the previous observation and use them for the current observations, and then extend it out for the next step using the transition probability table. The approach basically caches all the intermediate state probabilities so they are computed only once. This helps us to compute a fixed state path. The process is also called posterior decoding.

Backward Algorithm

The backward Algorithm is the time-reversed version of the Forward Algorithm. In the Backward Algorithm, we need to find the probability that the machine will be in a hidden state at time step t and will generate the remaining part of the sequence. Mathematically, it is represented as:

Expression for backward algorithm
Expression for backward algorithm

Forward-Backward Algorithm

The forward-backward algorithm is an inference algorithm that computes for all hidden state variables, the distribution. This inference task is usually called smoothing. The algorithm makes use of the principle of dynamic programming to efficiently compute the values that are required to obtain the posterior marginal distributions in two passes. The first pass goes forward in time while the second goes backward in time; hence the name forward-backward algorithm. It is basically a combination of the forward and backward algorithms that have been explained above. This algorithm computes the probability of an observation sequence given the HMM. This probability can be used to classify observation sequences in recognition applications.

Viterbi algorithm

For an HMM, that contains hidden variables, the task of determining which sequence of variables is the most likely underlying source of some sequence of observations is called the decoding task. The task of a decoder is to find the best-hidden sequence when there is a trained model and some observed data. More formally, given as input an as A and B and a sequence of observations O = o1, o2, …, oT , find the most probable sequence of states S = s1, s2, s3 . . .sT .

We can of course use the forward algorithm to find all the possible sequences and pick the best one on the maximum likelihood, but this can not be done since there are an exponentially large number of state sequences.

Like the forward algorithm, it estimates the probability that the vt(j) is in state j after seeing the first t observations and passing through the most likely state sequence s1, s2, s3 ……..sT. The value of each cell vt( j) is computed recursively taking the most probable path that could lead us to this cell. Formally, each cell expresses the following probability

We can understand this in a better way via an example,

HMM Model

Suppose we have an HMM model as shown in the diagram. We always start with the sun(x), cloud(y), and end with the snow(z). Now, you have observed the sequence in the Forward algorithm problem. What is the most likely sequence that would generate this path?

The options are:

  • xxyz
  • xyyz
  • xyzz
Manual Calculations
Manual Calculations

In this case, the bold path is the Viterbi path. You can see this when you get back from the last state: 0.3*0.4*0.136 > 0.7*0.4*0.024

Similarly, 0.4*0.5*0.4 > 0.7*0.4*0.2

Thus the most probable sequence is xxyz.

Please note that there can be multiple possible best sequences.

As every state depends only on the state before, you can get the most likely path step by step. In every step, you calculate how likely it is that you end up in state x, state y, state z. After that, it doesn’t matter how you got there. 

Baum-Welch Algorithm

It provides an answer to the question “Under what parameterization is the observed sequence most probable?”

Baum-Welch algorithm is a special case of the expectation-maximization(EM) algorithm. It makes use of the forward-backward algorithm to compute the statistics for a particular step. Its purpose is to tune the parameters of the HMM, namely the state transition matrix, the emission matrix, and the initial state distribution.

  • Start with random initial estimates of the transition and emission matrix.
  • Compute expectation of how often each transition/emission has been used. We will estimate latent variables [ ξ , γ ]
  • Re-calculate the probabilities of the transition and emission matrix based on those estimates.
  • Repeat until convergence
Expression for latent variable γ
Expression for latent variable γ

γ is the probability of being in state i at the time  given the observed sequence Y and parameters theta

Expression for latent variable ξ
Expression for latent variable ξ

ξ is the probability of being in states i,j at the time  given the observed sequence Y and parameters theta

The parameters (A,B) are updated as follows

Transition Matrix Formula
Transition Matrix Formula
Emission Matrix Formula

Some links in case you want to read more about the topic:

  1. https://youtube.com/playlist?list=PLM8wYQRetTxBkdvBtz-gw8b9lcVkdXQKV
  2. https://www.cs.cmu.edu/~aarti/Class/10701_Spring14/slides/HMM.pdf

Similar Posts

Leave a Reply

Your email address will not be published.