A hidden Markov model (HMM) is one in which you observe a sequence of emissions, but do not know the sequence of states the model went through to generate the emissions. We analyze the hidden Markov models to recover the sequence of states from the observed data.
You can read more about it from our earlier post here.
You know what Hidden Markov Models are, now let’s try to understand in depth about them
Basic Structure of HMM:
As we have discussed earlier, Hidden Markov Model has with following parameters :
- Set of M Hidden States
- A Transaction Probability Matrix (A)
- A sequence of t observations
- A Emission Probability Matrix (Also known as Observation Likelihood) (B)
- An Initial Probability Distribution
Central Issues with HMM
The first issue is to find the probability of an observed sequence.- Forward Algorithm
The second issue is to find the most probable sequence of hidden states- Viterbi Algorithm, Forward-Backward Algorithm
The third issue is to find parameters that maximize the likelihood of observed data- Baum-Welch Algorithm
Assumptions about HMM
Since the HMM is an augmented form of the Markov Model, the following assumption holds true:
The future state is going to be dependent only on the current state. It is expressed as follows:
The probability of an output observation(oi) depends only on the state that produced the observation qi and not on any other states or any other observations. It is expressed as follows:
Hidden Markov Models Algorithms
Let us have a simple HMM, we have two hidden states, they represent the weather in a town, we also know a kid who can carry either of 2 things, a hat and an umbrella depending on the weather. The relationship between the kid’s items and the weather is shown by the orange and blue arrows. The black arrows represent the transition of states.
Suppose we know the following sequence of items
We use the forward algorithm to find the probability of the observed sequence given the parameters of HMM are known which are the transition matrix, emission matrix, and stationary distribution(π).
We find all the possible hidden states that may lead us to this sequence. We find the cumulative probability of each sequence. We find that in total there are 8 such sequences. Now calculating the probabilities of each sequence is going to be a tedious task. Computing the joint probabilities directly would require marginalizing over all possible state sequences, the number of which grows exponentially with the increase in length of the sequence. Instead, the forward algorithm takes advantage of the conditional independence rules of the hidden Markov model to perform the calculation recursively.
The probability of a sequence is a sum of all the possible sequences of the hidden states(s), which can be represented as above.
Recursively, it is expressed as follows, with t being the length of the sequence and s representing the hidden state.
For example, if t=1,
Now, to get the final answer we find the αt for every hidden state and sum it. It is represented as follows:
In Forward Algorithm, we use the computed probability on the current time step to derive the probability of the next time step. Hence it is computationally more efficient.
The forward algorithm is mostly used in applications that need us to determine the probability of being in a specific state when we know about the sequence of observations. We first calculate the probabilities over the states computed for the previous observation and use them for the current observations, and then extend it out for the next step using the transition probability table. The approach basically caches all the intermediate state probabilities so they are computed only once. This helps us to compute a fixed state path. The process is also called posterior decoding.
The backward Algorithm is the time-reversed version of the Forward Algorithm. In the Backward Algorithm, we need to find the probability that the machine will be in a hidden state at time step t and will generate the remaining part of the sequence. Mathematically, it is represented as:
The forward-backward algorithm is an inference algorithm that computes for all hidden state variables, the distribution. This inference task is usually called smoothing. The algorithm makes use of the principle of dynamic programming to efficiently compute the values that are required to obtain the posterior marginal distributions in two passes. The first pass goes forward in time while the second goes backward in time; hence the name forward-backward algorithm. It is basically a combination of the forward and backward algorithms that have been explained above. This algorithm computes the probability of an observation sequence given the HMM. This probability can be used to classify observation sequences in recognition applications.
For an HMM, that contains hidden variables, the task of determining which sequence of variables is the most likely underlying source of some sequence of observations is called the decoding task. The task of a decoder is to find the best-hidden sequence when there is a trained model and some observed data. More formally, given as input an as A and B and a sequence of observations O = o1, o2, …, oT , find the most probable sequence of states S = s1, s2, s3 . . .sT .
We can of course use the forward algorithm to find all the possible sequences and pick the best one on the maximum likelihood, but this can not be done since there are an exponentially large number of state sequences.
Like the forward algorithm, it estimates the probability that the vt(j) is in state j after seeing the first t observations and passing through the most likely state sequence s1, s2, s3 ……..sT. The value of each cell vt( j) is computed recursively taking the most probable path that could lead us to this cell. Formally, each cell expresses the following probability
We can understand this in a better way via an example,
Suppose we have an HMM model as shown in the diagram. We always start with the sun(x), cloud(y), and end with the snow(z). Now, you have observed the sequence in the Forward algorithm problem. What is the most likely sequence that would generate this path?
The options are:
In this case, the bold path is the Viterbi path. You can see this when you get back from the last state: 0.3*0.4*0.136 > 0.7*0.4*0.024
Similarly, 0.4*0.5*0.4 > 0.7*0.4*0.2
Thus the most probable sequence is xxyz.
Please note that there can be multiple possible best sequences.
As every state depends only on the state before, you can get the most likely path step by step. In every step, you calculate how likely it is that you end up in state x, state y, state z. After that, it doesn’t matter how you got there.
It provides an answer to the question “Under what parameterization is the observed sequence most probable?”
Baum-Welch algorithm is a special case of the expectation-maximization(EM) algorithm. It makes use of the forward-backward algorithm to compute the statistics for a particular step. Its purpose is to tune the parameters of the HMM, namely the state transition matrix, the emission matrix, and the initial state distribution.
- Start with random initial estimates of the transition and emission matrix.
- Compute expectation of how often each transition/emission has been used. We will estimate latent variables [ ξ , γ ]
- Re-calculate the probabilities of the transition and emission matrix based on those estimates.
- Repeat until convergence
γ is the probability of being in state i at the time given the observed sequence Y and parameters theta
ξ is the probability of being in states i,j at the time given the observed sequence Y and parameters theta
The parameters (A,B) are updated as follows
Some links in case you want to read more about the topic: