$30
CS5340 ASSIGNMENT 3:
HIDDEN MARKOV MODELS
1. OVERVIEW
In this assignment, you will implement the Baum-Welch algorithm to find the unknown
parameters of 1-dimensional Gaussian hidden Markov models (HMMs).
References: Lecture 8
Honour Code. This coding assignment constitutes 15% of your final grade in CS5340. Note that
plagiarism will not be condoned! You may discuss with your classmates and check the internet
for references, but you MUST NOT submit code/report that is copied directly from other sources!
2. SUBMISSION INSTRUCTIONS
Items to be submitted:
• Source code (lab3.py). This is where you fill in all your code.
• Report (report.pdf). This should describe your implementation and be no more than one
page.
Please indicate clearly your name and student number (the one that looks like A1234567X) in the
report as well as the top of your source code. Zip the two files together and name it in the
following format: A1234567X_lab3.zip (replace with your student number).
3. TEST CASES
As with previous assignments, we have provided a few test cases and their expected outputs,
which can be found in the data/ folder. You can find the code that loads the sample data and
checks your program output in test_lab3.py. During grading, we will also check your code with
additional held out cases.
Note that for the overall Baum-Welch implementation, we are unable to provide the checking
code since your algorithm output may differ slightly from ours. Instead, we provide the
groundtruth parameters, and you should use them to check the outputs. In addition, we also list
the parameters obtained using our reference implementation in the Appendix.
Submit your assignment by 29 October2023, 2359HRS to Canvas. 25% of the total score will
be deducted for each day of late submission.
CS5340 Assignment 3 (Semester 1, AY2023/2024)
2
4. NOTATIONS
We will use the same notations as the lectures.
• 𝐾: Number of states
• 𝑁: Number of time steps
• 𝛉: Parameters {𝛑,𝐀, 𝛟) of the HMM which we want to estimate.
• 𝛑: Initial state distribution, represented as a numpy array of size 𝐾.
• 𝐀: Time-independent stochastic transition matrix. This is represented as a 𝐾 × 𝐾 numpy
array.
• 𝛟: Parameters {𝜙1, … ,𝜙𝐾} of the conditional distribution of the emission probabilities.
For the Gaussian HMM we are considering, it consists of the means {𝜇1, … , 𝜇𝐾} and
standard deviations {𝜎1, … , 𝜎𝐾}. Each is implemented as a numpy array of size K.
• 𝐗: Observed data {𝐱1, … , 𝐱𝑁}
• 𝐙: Latent variables {𝐳1, … , 𝐳𝑁}
• 𝛾(𝐳𝑛
): Marginal posterior distribution, i.e. 𝑝(𝐳𝑛|𝐗, 𝛉
old)
• 𝜉(𝐳𝑛−1, 𝐳𝑛
) : Joint posterior distribution of two successive latent variables,
𝑝(𝐳𝑛−1, 𝐳𝑛|𝐗, 𝛉
old)
5. BAUM-WELCH ALGORITHM
As we have learned in the lectures, the Baum-Welch algorithm, also known as the forwardbackward algorithm is used to find the unknown maximum likelihood parameters of a hidden
Markov model, i.e.
argmax𝛉 𝑝(𝐗|𝜽)
The above quantity is intractable to maximise directly, so the Baum-Welch algorithm uses an
Expectation-Maximization (EM) scheme to perform the maximization. The EM algorithm starts
with an initialization of the model parameters, which we denote by 𝛉
old
. It then alternates
between the E-step and M-step.
The E-step finds the posterior distribution 𝑝(𝐙|𝐗, 𝛉
𝑜𝑙𝑑) of the latent variables. This allows us to
evaluate the expression for the expectation of the log complete-data likelihood:
𝑄(𝛉, 𝛉
old) = 𝔼𝐙|𝐗,𝛉
old[ln 𝑝(𝐗, 𝐙|𝛉)] = ∑𝑝(𝐙|𝐗, 𝛉
old) ln 𝑝(𝐗, 𝐙|𝛉)
𝐳
The M-step then maximises 𝑄(𝛉, 𝛉
old) w.r.t. to the parameters 𝜽.
CS5340 Assignment 3 (Semester 1, AY2023/2024)
3
5.1 E-STEP
For an HMM, the expected log complete-data likelihood can be represented as:
𝑄(𝛉, 𝛉
old) = ∑𝛾(𝐳1𝑘)
𝐾
𝑘=1
ln𝛑𝑘 + ∑∑∑𝜉(𝐳𝑛−1, 𝐳𝑛
) ln 𝐀𝑗𝑘 +
𝐾
𝑘=1
𝐾
𝑗=1
𝑁
𝑛=2
∑∑𝛾(𝐳𝑛𝑘) ln 𝑝(𝐱𝑛|𝛟𝑘)
𝐾
𝑗=1
𝑁
𝑛=1
The goal of the E-step is thus to compute the quantities of the marginal and posterior distributions
𝛾(𝐳𝑛
) and 𝜉(𝐳𝑛−1, 𝐳𝑛
).
Now, implement the function e_step() using the alpha-beta variant of the algorithm taught in
the lectures. This function takes as input 𝐗 and the estimated parameters from the M-step, 𝛉 =
{𝛑,𝐀, 𝛟}. It outputs 𝛾(𝐳𝑛
) and 𝜉(𝐳𝑛−1, 𝐳𝑛
).
Since the training data X is a list of sequences, the outputs from your function 𝛾(𝐳𝑛
) and
𝜉(𝐳𝑛−1, 𝐳𝑛
) should also be a list of the same cardinality. You should compute the outputs for each
sequence separately, and concatenate them into a list.
As mentioned in the lectures, the values of 𝛼 can go to zero quickly for long sequences. You will
need to implement the scaling factors to receive full credit for this portion.
Additional Hints:
• We have provided two sequences to check the implementation of your E-step. Your
algorithm should work on the short sequence even without the scaling factors; you can
use it to check your implementation before you include the scaling factors.
• The indexing of dimensions may be tricky. Use the time indices (i.e. whether it’s 𝐳𝑛−1 or
𝐳𝑛) to keep track of which dimensions should add/multiply with each other.
5.2 M-STEP
Now, write the code for m_step(), which implements the M-step of the Baum-Welch algorithm.
This function should take in 𝐗, 𝛾(𝐳𝑛
), 𝜉(𝐳𝑛−1, 𝐳𝑛
) and estimate the HMM parameters 𝛉 = {𝛑,𝐀, 𝛟}.
For this assignment, we will only consider 1D Gaussian HMMs, so the emission probabilities
parameters 𝛟 consist of the means {𝜇1, … , 𝜇𝐾} and {𝜎1, … , 𝜎𝐾}.
5.3 PUTTING THEM TOGETHER
You are now ready to put the e-step and m-step together. Implement fit_hmm() which calls your
implemented e- and m-step functions to estimate the HMM parameters.
To facilitate grading, we have implemented an initialization routine for 𝛉 = {𝛑,𝐀, 𝛟} which uses
a fixed random seed. The randomly initialised 𝛑,𝐀 will satisfy the required summation and nonnegativity constraints. To obtain the initial gaussian parameters 𝛟 = {𝛍, 𝛔}, we run k-means
clustering on the observed measurements which will provide a good starting point for your EM
algorithm. You should make use of the initialized 𝛉 in your first iteration of the e-step.
The last thing to handle is the convergence condition. For this, you can use any reasonable scheme.
Our suggestion is to monitor the parameters 𝛉, and terminate it when their values do not change
much in the iteration (e.g. the parameters change by a maximum of <1e-4).
CS5340 Assignment 3 (Semester 1, AY2023/2024)
4
As mentioned earlier, we are unable to provide the checking code for this part since your
algorithm output may differ slightly from ours. Instead, do refer to the appendix for the
groundtruth parameters used to generate the observations, as well as the values estimated by
our reference implementation.
During grading, we will evaluate your solution using the likelihood 𝑝(𝐗|𝛉).
CS5340 Assignment 3 (Semester 1, AY2023/2024)
CS5340 Assignment 3 (Semester 1, AY2020/2021) 5
APPENDIX: TRAINING DATASETS
In this section, we show the data we provided for checking your code, as well as our estimated
parameters.
Figure 1: "seq_short " dataset, which consists of 200 sequences each having 8 timesteps: (a) The groundtruth model
used to generate the observations, and (b) Estimated parameters using the reference implementation.
Figure 2: "seq_long" dataset, which consists of 5 sequences each having 1000 timesteps. (a) The groundtruth model
used to generate the observations, and (b) Estimated parameters using the reference implementation.