Markov series III. hidden Markov model learning problem Baum Welch algorithm

In the process of reprint, I found that some parts of the original text were not understood, so I read other articles, and then introduced the code implementation.

Catalog

1, Training data includes observation sequence and state sequence

1. Calculation of initial probability

2. Calculation of transfer probability

3. Launch probability

2, Only observation sequence in training data

Baum Welch algorithm - π solution

Baum Welch algorithm-A solution

Baum Welch algorithm-B solution

code implementation

If the training data contains observation sequence and state sequence, the HMM learning problem is very simple, and it is a supervised learning algorithm.
If the training data only contains observation sequence, then the learning problem of HMM needs to be solved by EM algorithm, which is an unsupervised learning algorithm.

1, Training data includes observation sequence and state sequence

Using the conclusion of large number theorem that "the limit of frequency is probability", the parameter estimation of HMM is given directly;

Here are the explanations and explanations.

1. Calculation of initial probability

In the formula, | si|: the number of Si, indicating the total number of state i; Σ| si|: indicating the number of states at all time points. An example is shown as follows:

2. Calculation of transfer probability

In the formula, aij represents the probability of state i transferring to state j; | Sij represents the number of state i transferring to state j; ∑| Sij represents the number of state i transferring to state j at all time points. An example is shown in the figure below:

Note that the formula in the figure above has some small problems. The summation symbol in denominator and j in sij should be modified to other letters, such as K, so there is no ambiguity.

3. Launch probability

bij is the number of observations transferred from the current state.
For example, the possibility of moving from the state of box 1 to the observation value of taking out the white ball = b1 white;
b1 white = (1 - > White) / [(1 - > White) + (1 - > black)]

In the same formula, the summation symbol in the denominator and the J in qij should be modified to other letters, such as k, so as to better understand.

2, Only observation sequence in training data

There is only observation sequence and no state sequence, so Baum Welch algorithm must be used at this time. The Baum Welch algorithm in this paper is not explained in detail, and specific derivation and implementation need to refer to other blogs—— Detailed explanation of Baum Welch algorithm for hidden Markov model and ”From EM algorithm to Baum Welch algorithm . In the derivation of Baum Welch algorithm, the principle of EM algorithm and the conclusion of forward and backward algorithm in HMM probability problem are needed.

According to E-step of EM algorithm: get Q function, then maximize Q function,

 

Baum Welch algorithm - π solution

To maximize L, use Lagrange multiplier method to solve the value of π:

Baum Welch algorithm-A solution

To maximize L, use Lagrange multiplier method to solve the value of aij:

Baum Welch algorithm-B solution

To maximize L, use Lagrange multiplier method to solve the value of bij:

Baum Welch algorithm - maximized L-function, which can obtain the values of π, a and b respectively

Among them:

code implementation

From this post: ”From EM algorithm to Baum Welch algorithm , understand the code and cooperate with the blog HMM model of university canteen (3) -- Baum walk algorithm Come on - I haven't had time to verify this code. Later, I have the chance to verify it. First, I understand the theory of HMM and enter CRF, which is the most important. At present, for me!

#!/usr/bin/env python  
# -*- coding:utf-8 -*-  
import numpy


'''
Created on 2017 February 9th 2013

@author: Xue Lei Lei
'''


class HMM:
    def __init__(self,A,B,Pi):
        self.A=A
        self.B=B
        self.Pi=Pi

    #Forward algorithm
    def forward(self,O):
        row=self.A.shape[0]
        col=len(O)
        alpha=numpy.zeros((row,col))
        #initial value
        alpha[:,0]=self.Pi*self.B[:,O[0]]
        #Recurrence
        for t in range(1,col):
            for i in range(row):
                alpha[i,t]=numpy.dot(alpha[:,t-1],self.A[:,i])*self.B[i,O[t]]
        #termination
        return alpha

    #Backward algorithm
    def backward(self,O):
        row=self.A.shape[0]
        col=len(O)
        beta=numpy.zeros((row,col))
        #initial value
        beta[:,-1:]=1
        #Recurrence
        for t in reversed(range(col-1)):
            for i in range(row):
                beta[i,t]=numpy.sum(self.A[i,:]*self.B[:,O[t+1]]*beta[:,t+1])
        #termination
        return beta

    #Baum Welch algorithm: a combination of EM algorithm and HMM
    def baum_welch(self,O,e=0.05):

        row=self.A.shape[0]
        col=len(O)

        done=False
        while not done:
            zeta=numpy.zeros((row,row,col-1))
            alpha=self.forward(O)
            beta=self.backward(O)
            #EM algorithm: composed of E-step and M-step
            #E-step: calculate the expected values zeta and gamma
            for t in range(col-1):
                #Denominator part
                denominator=numpy.dot(numpy.dot(alpha[:,t],self.A)*self.B[:,O[t+1]],beta[:,t+1])
                for i in range(row):
                    #Molecular part and zeta value
                    numerator=alpha[i,t]*self.A[i,:]*self.B[:,O[t+1]]*beta[:,t+1]
                    zeta[i,:,t]=numerator/denominator
            gamma=numpy.sum(zeta,axis=1)
            final_numerator=(alpha[:,col-1]*beta[:,col-1]).reshape(-1,1)
            final=final_numerator/numpy.sum(final_numerator)
            gamma=numpy.hstack((gamma,final))
            #M-step: re estimate parameters Pi,A,B
            newPi=gamma[:,0]
            newA=numpy.sum(zeta,axis=2)/numpy.sum(gamma[:,:-1],axis=1)
            newB=numpy.copy(self.B)
            b_denominator=numpy.sum(gamma,axis=1)
            temp_matrix=numpy.zeros((1,len(O)))
            for k in range(self.B.shape[1]):
                for t in range(len(O)):
                    if O[t]==k:
                        temp_matrix[0][t]=1
                newB[:,k]=numpy.sum(gamma*temp_matrix,axis=1)/b_denominator
            #Stopping Threshold
            if numpy.max(abs(self.Pi-newPi))<e and numpy.max(abs(self.A-newA))<e and numpy.max(abs(self.B-newB))<e:
                done=True 
            self.A=newA
            self.B=newB
            self.Pi=newPi
        return self.Pi


#Convert dictionary to matrix
def matrix(X,index1,index2):
    #Initialize to 0 matrix
    m = numpy.zeros((len(index1),len(index2)))
    for row in X:
        for col in X[row]:
            #conversion
            m[index1.index(row)][index2.index(col)]=X[row][col]
    return m

if __name__ == "__main__":  
    #Initialization, random assignment of parameters A,B,Pi
    status=["Get along","Bye-bye"]
    observations=["Act like a spoiled child","Head down and play with cell phones","Friendly eyes","Take the initiative to leave contact information"] #Coquettish: punch you in the chest
    A={"Get along":{"Get along":0.5,"Bye-bye":0.5},"Bye-bye":{"Get along":0.5,"Bye-bye":0.5}}
    B={"Get along":{"Act like a spoiled child":0.4,"Head down and play with cell phones":0.1,"Friendly eyes":0.3,"Take the initiative to leave contact information":0.2},"Bye-bye":{"Act like a spoiled child":0.1,"Head down and play with cell phones":0.5,"Friendly eyes":0.2,"Take the initiative to leave contact information":0.2}}
    Pi=[0.5,0.5]
    O=[1,2,0,2,3,0]

    A=matrix(A,status,status)
    B=matrix(B,status,observations)
    hmm=HMM(A,B,Pi)
    print(hmm.baum_welch(O))

 

 

 

 

 

Published 19 original articles, won praise 7, visited 6807
Private letter follow

Tags: Python

Posted on Sun, 15 Mar 2020 04:25:59 -0700 by majocmatt