In the process of reprint, I found that some parts of the original text were not understood, so I read other articles, and then introduced the code implementation.

Catalog

1, Training data includes observation sequence and state sequence

1. Calculation of initial probability

2. Calculation of transfer probability

2, Only observation sequence in training data

Baum Welch algorithm - π solution

Baum Welch algorithm-A solution

Baum Welch algorithm-B solution

If the training data contains observation sequence and state sequence, the HMM learning problem is very simple, and it is a supervised learning algorithm.

If the training data only contains observation sequence, then the learning problem of HMM needs to be solved by EM algorithm, which is an unsupervised learning algorithm.

# 1, Training data includes observation sequence and state sequence

Using the conclusion of large number theorem that "the limit of frequency is probability", the parameter estimation of HMM is given directly;

Here are the explanations and explanations.

## 1. Calculation of initial probability

In the formula, | si|: the number of Si, indicating the total number of state i; Σ| si|: indicating the number of states at all time points. An example is shown as follows:

## 2. Calculation of transfer probability

In the formula, aij represents the probability of state i transferring to state j; | Sij represents the number of state i transferring to state j; ∑| Sij represents the number of state i transferring to state j at all time points. An example is shown in the figure below:

Note that the formula in the figure above has some small problems. The summation symbol in denominator and j in sij should be modified to other letters, such as K, so there is no ambiguity.

## 3. Launch probability

bij is the number of observations transferred from the current state.

For example, the possibility of moving from the state of box 1 to the observation value of taking out the white ball = b1 white;

b1 white = (1 - > White) / [(1 - > White) + (1 - > black)]

In the same formula, the summation symbol in the denominator and the J in qij should be modified to other letters, such as k, so as to better understand.

# 2, Only observation sequence in training data

There is only observation sequence and no state sequence, so Baum Welch algorithm must be used at this time. The Baum Welch algorithm in this paper is not explained in detail, and specific derivation and implementation need to refer to other blogs—— Detailed explanation of Baum Welch algorithm for hidden Markov model and ”From EM algorithm to Baum Welch algorithm . In the derivation of Baum Welch algorithm, the principle of EM algorithm and the conclusion of forward and backward algorithm in HMM probability problem are needed.

According to E-step of EM algorithm: get Q function, then maximize Q function,

## Baum Welch algorithm - π solution

To maximize L, use Lagrange multiplier method to solve the value of π:

## Baum Welch algorithm-A solution

To maximize L, use Lagrange multiplier method to solve the value of aij:

## Baum Welch algorithm-B solution

To maximize L, use Lagrange multiplier method to solve the value of bij:

Baum Welch algorithm - maximized L-function, which can obtain the values of π, a and b respectively

Among them:

## code implementation

From this post: ”From EM algorithm to Baum Welch algorithm , understand the code and cooperate with the blog HMM model of university canteen (3) -- Baum walk algorithm Come on - I haven't had time to verify this code. Later, I have the chance to verify it. First, I understand the theory of HMM and enter CRF, which is the most important. At present, for me!

#!/usr/bin/env python # -*- coding:utf-8 -*- import numpy ''' Created on 2017 February 9th 2013 @author: Xue Lei Lei ''' class HMM: def __init__(self,A,B,Pi): self.A=A self.B=B self.Pi=Pi #Forward algorithm def forward(self,O): row=self.A.shape[0] col=len(O) alpha=numpy.zeros((row,col)) #initial value alpha[:,0]=self.Pi*self.B[:,O[0]] #Recurrence for t in range(1,col): for i in range(row): alpha[i,t]=numpy.dot(alpha[:,t-1],self.A[:,i])*self.B[i,O[t]] #termination return alpha #Backward algorithm def backward(self,O): row=self.A.shape[0] col=len(O) beta=numpy.zeros((row,col)) #initial value beta[:,-1:]=1 #Recurrence for t in reversed(range(col-1)): for i in range(row): beta[i,t]=numpy.sum(self.A[i,:]*self.B[:,O[t+1]]*beta[:,t+1]) #termination return beta #Baum Welch algorithm: a combination of EM algorithm and HMM def baum_welch(self,O,e=0.05): row=self.A.shape[0] col=len(O) done=False while not done: zeta=numpy.zeros((row,row,col-1)) alpha=self.forward(O) beta=self.backward(O) #EM algorithm: composed of E-step and M-step #E-step: calculate the expected values zeta and gamma for t in range(col-1): #Denominator part denominator=numpy.dot(numpy.dot(alpha[:,t],self.A)*self.B[:,O[t+1]],beta[:,t+1]) for i in range(row): #Molecular part and zeta value numerator=alpha[i,t]*self.A[i,:]*self.B[:,O[t+1]]*beta[:,t+1] zeta[i,:,t]=numerator/denominator gamma=numpy.sum(zeta,axis=1) final_numerator=(alpha[:,col-1]*beta[:,col-1]).reshape(-1,1) final=final_numerator/numpy.sum(final_numerator) gamma=numpy.hstack((gamma,final)) #M-step: re estimate parameters Pi,A,B newPi=gamma[:,0] newA=numpy.sum(zeta,axis=2)/numpy.sum(gamma[:,:-1],axis=1) newB=numpy.copy(self.B) b_denominator=numpy.sum(gamma,axis=1) temp_matrix=numpy.zeros((1,len(O))) for k in range(self.B.shape[1]): for t in range(len(O)): if O[t]==k: temp_matrix[0][t]=1 newB[:,k]=numpy.sum(gamma*temp_matrix,axis=1)/b_denominator #Stopping Threshold if numpy.max(abs(self.Pi-newPi))<e and numpy.max(abs(self.A-newA))<e and numpy.max(abs(self.B-newB))<e: done=True self.A=newA self.B=newB self.Pi=newPi return self.Pi #Convert dictionary to matrix def matrix(X,index1,index2): #Initialize to 0 matrix m = numpy.zeros((len(index1),len(index2))) for row in X: for col in X[row]: #conversion m[index1.index(row)][index2.index(col)]=X[row][col] return m if __name__ == "__main__": #Initialization, random assignment of parameters A,B,Pi status=["Get along","Bye-bye"] observations=["Act like a spoiled child","Head down and play with cell phones","Friendly eyes","Take the initiative to leave contact information"] #Coquettish: punch you in the chest A={"Get along":{"Get along":0.5,"Bye-bye":0.5},"Bye-bye":{"Get along":0.5,"Bye-bye":0.5}} B={"Get along":{"Act like a spoiled child":0.4,"Head down and play with cell phones":0.1,"Friendly eyes":0.3,"Take the initiative to leave contact information":0.2},"Bye-bye":{"Act like a spoiled child":0.1,"Head down and play with cell phones":0.5,"Friendly eyes":0.2,"Take the initiative to leave contact information":0.2}} Pi=[0.5,0.5] O=[1,2,0,2,3,0] A=matrix(A,status,status) B=matrix(B,status,observations) hmm=HMM(A,B,Pi) print(hmm.baum_welch(O))