Forward-backward algorithm

From Wikipedia, the free encyclopedia

The forward-backward algorithm is a dynamic programming algorithm for computing the probability of a particular observation sequence, given the parameters of the model, in the context of hidden Markov models.

Contents

[edit] Overview

The algorithm is composed of three steps: Computing forward probabilities, computing backward probabilities and computing smoothed values. The forward and backward steps are often called forward message pass and backward message pass. The wording originates from the way the algorithm processes the given observation sequence. First the algorithm moves forward starting with the first observation in the sequence and going to the last, and then returning back to the first. At each single observation in the sequence probabilities to be used for calculations at the next observation are computed. During the backward pass the algorithm simultaneously performes the smoothing step. This step allows the algorithm to take into account any past observations of output for computing more accurate results.

[edit] Formal description

The following description will be based on matrices of probability values rather than on probability distributions. We transform the probability distributions related to a given hidden Markov model into matrix notation as follows. The transition probabilities \mathbf{P}(X_t\mid X_{t-1}) of a given random variable Xt representing all possible states in the hidden Markov model will be represented by the matrix \mathbf{T} where the row index, i, will represent the start state and the column index, j, represents the target state. For example, in the example below \mathbf{T} would be defined as:

\mathbf{T} = \begin{pmatrix}
  0.7 & 0.3 \\
  0.3 & 0.7 
\end{pmatrix}

Similarly, we will represent the probabilities of a new state, given some observed states as evidence, in an observation matrix \mathbf{O_t} where each diagonal entry contains the probability of a new state given the particular observed state at t as evidence. Note that t indicates a particular observation in the observation sequence. All other entries in the matrix will be zero. For example, in the example below \mathbf{O} the first observed evidence (t=1) is "umbrella". Therefore \mathbf{O_1} would be defined as:

\mathbf{O_1} = \begin{pmatrix}
  0.9 & 0.0 \\
  0.0 & 0.2 
\end{pmatrix}

This allows a very elegant description of how the next forward probability is computed. Let the set of forward probabilities be stored in yet another matrix \mathbf{f_{1:t+1}}. Where 1:t+1 indicates that the computed probabilities are depending on all forward probabilities from 1 to t+1 including the current one which we will describe with \mathbf{f_{1:t}}. Hence, \mathbf{f_{1:t+1}} is equal to the multiplication of the transposed transformation matrix with the current forward probabilities and the observation matrix for the next evidence observation. Finally the resulting matrix needs to be normalized, i.e. the resulting values in the matrix are divided by the sum of all values in the matrix. The normalization factor is described with α. Therefore the resulting forward computation is described by the following general formula:

\mathbf{f_{1:t+1}} = \alpha\mathbf{O_{t+1}}\mathbf{T^T}\mathbf{f_{1:t}}

The backward computation \mathbf{b_{k+1:t}}, that is starting at the end of the sequence, can be described in a similar manner. Let the end of the sequence be described by the index k, starting at 0. Therefore running from k down to t=0 and calculating each backward probability can be described by the following formula:

\mathbf{b_{k+1:t}} = \alpha(\mathbf{T}\mathbf{O_{k+1}}\mathbf{b_{k+2t}})\times\mathbf{f_{1:t}}

Note that we use the non-transposed matrix of \mathbf{T} and that the order of the terms has changed. Also note that the final product is not a usual matrix multiplication, but a point product. This operation multiplies each value in one matrix with the corresponding value of the other. Finally note that the description in Russel & Norvig 2003 pp. 550 excludes the point product, thought the procedure is required earlier. The third and final step is the computation of smoothed probabilities \mathbf{svt}. These are the point product of the backward and forward probabilities for each t. Therefore the formular is defined as:

\mathbf{sv_t} = \alpha\mathbf{b_{k+1:t}}\times\mathbf{f_{1:t}}

The example in the following section will show numerically how each of these matrices are calculated.

[edit] Example

This example is based on the umbrella world in Russel & Norvig 2003 pp. 540. The example uses a longer observation sequence {umbrella, umbrella, no umbrella, umbrella, umbrella} than the example described in the book. Assume the probabilities for rain given an umbrella observation are 90% if an umbrella is observed and 20% if no umbrella is observed. Further more assume that the probability of rain in t+1 given that it is raining in now (i.e. t) is 70% and the probability that it does not rain is 30%. The following matrices can be extracted from the umbrella world given the above observations:


\mathbf{O_1} = \begin{pmatrix}0.9 & 0.0 \\  0.0 & 0.2 \end{pmatrix} \mathbf{O_2} = \begin{pmatrix}0.9 & 0.0 \\  0.0 & 0.2 \end{pmatrix}\mathbf{O_3} = \begin{pmatrix}0.1 & 0.0 \\  0.0 & 0.8 \end{pmatrix}\mathbf{O_4} = \begin{pmatrix}0.9 & 0.0 \\  0.0 & 0.2 \end{pmatrix}\mathbf{O_5} = \begin{pmatrix}0.9 & 0.0 \\  0.0 & 0.2 \end{pmatrix}

\mathbf{T} = \begin{pmatrix}  0.7 & 0.3 \\  0.3 & 0.7 \end{pmatrix} \mathbf{T^T} = \begin{pmatrix}  0.7 & 0.3 \\  0.3 & 0.7 \end{pmatrix}

Note that \mathbf{O_3} is different from the others because of the "no umbrella" observation. Also note that \mathbf{T} and \mathbf{T^T} are equal because the are symmetrical. before we can start to compute forward messages we need to describe two special values, the first forward probability and the k+2 backward probability. The first forward probability at t=0 is defined by the prior of the random variable. The k+2 backward probability is defined by the "true" matrix. Therefore if follows:


\mathbf{f_{1:0}}= \begin{pmatrix}  0.5 & 0.5 \end{pmatrix}

\mathbf{b_{k+2t}} = \begin{pmatrix}  1.0 & 1.0\end{pmatrix}

Now we will first iterate through all t and compute the forward probabilities. The following matrices are appearing in the same order as described in the forward formula above. Some calculations may be less accurate due to the limited numbers of decimals used in this example.


\mathbf{f_{1:1}} = \alpha\begin{pmatrix}0.9 & 0.0 \\  0.0 & 0.2 \end{pmatrix}\begin{pmatrix}  0.7 & 0.3 \\  0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.5000 & 0.5000 \end{pmatrix}=\alpha\begin{pmatrix}0.4500 & 0.1000\end{pmatrix}=\begin{pmatrix}0.8182 & 0.1818 \end{pmatrix}

\mathbf{f_{1:2}} = \alpha\begin{pmatrix}0.9 & 0.0 \\  0.0 & 0.2 \end{pmatrix}\begin{pmatrix}  0.7 & 0.3 \\  0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.8182 & 0.1818 \end{pmatrix}=\alpha\begin{pmatrix}0.5645 & 0.0745\end{pmatrix}=\begin{pmatrix}0.8834 & 0.1165 \end{pmatrix}

\mathbf{f_{1:3}} = \alpha\begin{pmatrix}0.1 & 0.0 \\  0.0 & 0.8 \end{pmatrix}\begin{pmatrix}  0.7 & 0.3 \\  0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.8834 & 0.1165 \end{pmatrix}=\alpha\begin{pmatrix}0.0653 & 0.2772\end{pmatrix}=\begin{pmatrix}0.1906 & 0.8093 \end{pmatrix}

\mathbf{f_{1:4}} = \alpha\begin{pmatrix}0.9 & 0.0 \\  0.0 & 0.2 \end{pmatrix}\begin{pmatrix}  0.7 & 0.3 \\  0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.1906 & 0.8093 \end{pmatrix}=\alpha\begin{pmatrix}0.3386 & 0.1247\end{pmatrix}=\begin{pmatrix}0.7308 & 0.2691 \end{pmatrix}

\mathbf{f_{1:5}} = \alpha\begin{pmatrix}0.9 & 0.0 \\  0.0 & 0.2 \end{pmatrix}\begin{pmatrix}  0.7 & 0.3 \\  0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.7308 & 0.2691 \end{pmatrix}=\alpha\begin{pmatrix}0.5331 & 0.0815\end{pmatrix}=\begin{pmatrix}0.8673 & 0.1326 \end{pmatrix}

Now that the forward probabilities are defined, we continue to compute the backward probabilities. Again the matrices appear in the order as in the backward formula above.


\mathbf{b_{5:5}}  = \alpha(\begin{pmatrix}  0.7 & 0.3 \\  0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.9 & 0.0 \\  0.0 & 0.2 \end{pmatrix}\begin{pmatrix}1.0000 & 1.0000 \end{pmatrix})\times \begin{pmatrix}0.8673 & 0.1326 \end{pmatrix}=\alpha\begin{pmatrix}0.5984 & 0.0543\end{pmatrix}=\begin{pmatrix}0.9168 & 0.0831 \end{pmatrix}

\mathbf{b_{4:5}}  = \alpha(\begin{pmatrix}  0.7 & 0.3 \\  0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.9 & 0.0 \\  0.0 & 0.2 \end{pmatrix}\begin{pmatrix}0.9168 & 0.0831 \end{pmatrix})\times \begin{pmatrix}0.7308 & 0.2691 \end{pmatrix}=\alpha\begin{pmatrix}0.7308 & 0.2691\end{pmatrix}=\begin{pmatrix}0.8593 & 0.1407 \end{pmatrix}

\mathbf{b_{3:5}}  = \alpha(\begin{pmatrix}  0.7 & 0.3 \\  0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.1 & 0.0 \\  0.0 & 0.8 \end{pmatrix}\begin{pmatrix}0.8593 & 0.1407 \end{pmatrix})\times \begin{pmatrix}0.1906 & 0.8093 \end{pmatrix}=\alpha\begin{pmatrix}0.0178 & 0.0845\end{pmatrix}=\begin{pmatrix}0.1739 & 0.8260 \end{pmatrix}

\mathbf{b_{2:5}}  = \alpha(\begin{pmatrix}  0.7 & 0.3 \\  0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.9 & 0.0 \\  0.0 & 0.2 \end{pmatrix}\begin{pmatrix}0.1739 & 0.8260 \end{pmatrix})\times \begin{pmatrix}0.8834 & 0.1165 \end{pmatrix}=\alpha\begin{pmatrix}0.1405 & 0.0189\end{pmatrix}=\begin{pmatrix}0.8814 & 0.1185 \end{pmatrix}

\mathbf{b_{1:5}}  = \alpha(\begin{pmatrix}  0.7 & 0.3 \\  0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.9 & 0.0 \\  0.0 & 0.2 \end{pmatrix}\begin{pmatrix}0.8814 & 0.1185 \end{pmatrix})\times \begin{pmatrix}0.8182 & 0.1818 \end{pmatrix}=\alpha\begin{pmatrix}0.4600 & 0.0462\end{pmatrix}=\begin{pmatrix}0.9087 & 0.0912 \end{pmatrix}

Finally we will compute the smoothed probability values. The ordering of the matrices follows the smoothed values formula above.


\mathbf{sv_{5}}  = \alpha\begin{pmatrix}1.0000 & 1.0000 \end{pmatrix}\times \begin{pmatrix}0.8673 & 0.1326 \end{pmatrix}=\alpha\begin{pmatrix}0.8673 & 0.1326 \end{pmatrix}=\begin{pmatrix}0.8673 & 0.1326 \end{pmatrix}

\mathbf{sv_{4}}  = \alpha\begin{pmatrix}0.9168 & 0.0831 \end{pmatrix}\times \begin{pmatrix}0.7308 & 0.2691 \end{pmatrix}=\alpha\begin{pmatrix}0.6699 & 0.0223\end{pmatrix}=\begin{pmatrix}0.9677 & 0.322 \end{pmatrix}

\mathbf{sv_{3}}  = \alpha\begin{pmatrix}0.8593 & 0.1407 \end{pmatrix}\times \begin{pmatrix}0.1906 & 0.8093 \end{pmatrix}=\alpha\begin{pmatrix}0.1637 & 0.1138\end{pmatrix}=\begin{pmatrix}0.5899 & 0.4101 \end{pmatrix}

\mathbf{sv_{2}}  = \alpha\begin{pmatrix}0.1739 & 0.8260 \end{pmatrix}\times \begin{pmatrix}0.8834 & 0.1165 \end{pmatrix}=\alpha\begin{pmatrix}0.1536 & 0.0962\end{pmatrix}=\begin{pmatrix}0.6148 & 0.3852 \end{pmatrix}

\mathbf{sv_{1}}  = \alpha\begin{pmatrix}0.8814 & 0.1185 \end{pmatrix}\times \begin{pmatrix}0.8182 & 0.1818 \end{pmatrix}=\alpha\begin{pmatrix}0.7211 & 0.0215\end{pmatrix}=\begin{pmatrix}0.9710 & 0.0289 \end{pmatrix}

[edit] Performance

The brute force procedure for the solution of this problem is the generation of all possible sequences of observed events and hidden states with their probabilities using the two transition matrices. The joint probability of two sequences, given the model, is calculated by multiplying the corresponding probabilities. The brute force procedure has time complexity  O(T \cdot N^T) , where T is the length of sequences and N is the number of symbols in the state alphabet. This is intractable for realistic problems, as the number of possible hidden node sequences typically is extremely high. However, the forward-backward algorithm has time complexity  O(N^2 T)\, . Several enhancements are known to the general forward-backward algorithm which allow for computations to take place in constant space. In addition, as t grows, algorithms have been developed to compute \mathbf{f_{1:t+1}} efficiently through online smoothing such as the fixed-lag smoothing algorithm Russel & Norvig 2003 pp. 552.

[edit] Pseudo code

ForwardBackward(guessState, sequenceIndex):
    if sequenceIndex is past the end of the sequence, return 1
    if (guessState, sequenceIndex) has been seen before, return saved result
    result = 0
    for each neighboring state n:
        result = result + (transition probability from guessState to n given observation element at sequenceIndex)*ForwardBackward(n, sequenceIndex+1)
    save result for (guessState, sequenceIndex)
    return result

[edit] See also

[edit] References

  • Russel S., Norvig P. (2003). Artificial Intelligence A Modern Approach 2nd Edition. Pearson Education. ISBN 0-13-080302-2. 

[edit] External links

Languages