Starting from:

$29

CSE 250A. Assignment 6

CSE 250A. Assignment 6

Supplementary reading:
• Russell & Norvig, Chapter 15.
• L. R. Rabiner (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2):257–286.
6.1 Viterbi algorithm
In this problem, you will decode an English sentence from a long sequence of non-text observations. To
do so, you will implement the same basic algorithm used in most engines for automatic speech recognition.
In a speech recognizer, these observations would be derived from real-valued measurements of acoustic
waveforms. Here, for simplicity, the observations only take on binary values, but the high-level concepts are
the same.
Consider a discrete HMM with n = 26 hidden states St∈ {1, 2, . . . , z} and binary observations Ot∈ {0, 1}.
Download the ASCII data files from the course web site for this assignment. These files contain parameter
values for the initial state distribution πi = P(S1 = i), the transition matrix aij = P(St+1 = j|St = i), and
the emission matrix bik = P(Ot =k|St =i), as well as a long bit sequence of T = 75000 observations.
Use the Viterbi algorithm to compute the most probable sequence of hidden states conditioned on this
particular sequence of observations. Turn in the following:
(a) a hard-copy print-out of your source code
(b) a plot of the most likely sequence of hidden states versus time.
You may program in the language of your choice, but it will behoove you to consider the efficiency of your
implementation in addition to its correctness. Well-written code should execute in seconds (or less).
To check your answer: suppose that the hidden states {1, 2, . . . , 26} represent the letters {a, b, . . . , z} of
the English alphabet. The most probable sequence of hidden states (ignoring repeated letters) will reveal a
timely message for the upcoming holiday season.
1
6.2 Inference in HMMs
Consider a discrete HMM with hidden states St
, observations Ot
, transition matrix aij =P(St+1 =j|St =i)
and emission matrix bik =P(Ot =k|St =i). In class, we defined the forward-backward probabilities:
αit = P(o1, o2, . . . , ot
, St =i),
βit = P(ot+1, ot+2, . . . , oT |St =i),
for a particular observation sequence {o1, o2, . . . , oT } of length T. This problem will increase your familiarity with these definitions. In terms of these probabilities, which you may assume to be given, as well as the
transition and emission matrices of the HMM, show how to (efficiently) compute the following probabilities:
(a) P(St =i|St+1 =j, o1, o2, . . . , oT )
(b) P(St+1 =j|St =i, o1, o2, . . . , oT )
(c) P(St+1 =k|St−1 =i, St =j, o1, o2, . . . , oT )
(d) P(St+1 =k|St−1 =i, o1, o2, . . . , oT )
In all these problems, you may assume that t 1 and t < T; in particular, you are not asked to consider the
boundary cases.
2
6.3 Belief updating
In this problem, you will derive recursion relations for real-time updating of beliefs based on incoming
evidence. These relations are useful for situated agents that must monitor their environments in real-time.
(a) Consider the discrete hidden Markov model (HMM) with hidden states St
, observations Ot
, transition
matrix aij and emission matrix bik. Let
qit = P(St =i|o1, o2, . . . , ot)
denote the conditional probability that St
is in the i
th state of the HMM based on the evidence up to
and including time t. Derive the recursion relation:
qjt =
1
Zt
bj (ot)
X
i
aijqit−1 where Zt =
X
ij
bj (ot)aijqit−1.
Justify each step in your derivation—for example, by appealing to Bayes rule or properties of conditional independence.
(b) Consider the dynamical system with continuous, real-valued hidden states Xt and observations Yt
,
represented by the belief network shown below. By analogy to the previous problem (replacing sums
by integrals), derive the recursion relation:
P(xt
|y1, y2, . . . , yt) = 1
Zt
P(yt
|xt)
Z
dxt−1 P(xt
|xt−1) P(xt−1|y1, y2, . . . , yt−1),
where Zt
is the appropriate normalization factor,
Zt =
Z
dxt P(yt
|xt)
Z
dxt−1 P(xt
|xt−1) P(xt−1|y1, y2, . . . , yt−1).
In principle, an agent could use this recursion for real-time updating of beliefs in arbitrarily complicated continuous worlds. In practice, why is this difficult for all but Gaussian random variables?
X1 X2 X3 X4
Y1 Y2 Y3 Y4
...
3
6.4 Continuous density HMM
In class, we studied discrete HMMs with discrete hidden states and observations, as well as linear dynamical
systems with continuous hidden states and observations.
This problem considers a continuous density HMM, which has discrete hidden states but continuous observations. Let St ∈ {1, 2, ..., n} denote the hidden state of the HMM at time t, and let Xt ∈ < denote the
real-valued scalar observation of the HMM at time t. The continuous density HMM makes the same Markov
assumptions as the discrete HMM in class. In particular, the joint distribution over sequences S = {St}
T
t=1
and X ={Xt}
T
t=1 is given by:
P(S, X) = P(S1)
Y
T
t=2
P(St
|St−1)
Y
T
t=1
P(Xt
|St).
In a continuous density HMM, however, the distribution P(Xt
|St) must be parameterized since the random
variable Xt
is no longer discrete. Suppose that the observations are modeled as Gaussian random variables:
P(Xt =x|St =i) = 1
q
2πσ2
i
exp "

(x − µi)
2

2
i
#
with state-dependent means and variances. Indicate whether each of the following distributions is Gaussian
(univariate or multivariate) or a mixture of Gaussians. Also, if the distribution is a mixture of Gaussians,
indicate how many mixture components it contains. The first problem has been done as an example.
(*) P(X1)
The distribution P(X1) is a mixture of univariate Gaussians.
It contains n mixture components because it can be written as
P(X1) = Xn
i=1
P(X1|S1 =i)P(S1 =i).
(a) P(Xt
|St−1)
(b) P(Xt
, Xt
0|St
, St
0)
(c) P(X1, X2, . . . , XT )
(d) P(Xt
|X1, X2, . . . , Xt−1)
(e) P(XT )
(f) P(X1, X2, . . . , Xt
|S1, S2, . . . , St)
4
6.5 Mixture model decision boundary
Consider a multivariate Gaussian mixture model with two mixture components. The model has a hidden
binary variable y ∈ {0, 1} and an observed vector variable ~x ∈ Rd
, with graphical model:
y x
P(y=i) = πi P(~x|y=i) = (2π)
− d
2 |Σi
|
− 1
2 e
− 1
2
(~x−~µi)
T Σ
−1
i
(~x−~µi)
The parameters of the Gaussian mixture model are the prior probabilities π0 and π1, the mean vectors ~µ0
and ~µ1, and the covariance matrices Σ0 and Σ1.
(a) Compute the posterior distribution P(y= 1|~x) as a function of the parameters (π0, π1, ~µ0, ~µ1, Σ0, Σ1)
of the Gaussian mixture model.
(b) Consider the special case of this model where the two mixture components share the same covariance
matrix: namely, Σ0 = Σ1 = Σ. In this case, show that your answer from part (a) can be written as:
P(y = 1|~x) = σ( ~w · ~x + b) where σ(z) = 1
1 + e−z
.
As part of your answer, you should express the parameters ( ~w, b) of the sigmoid function explicitly in
terms of the parameters (π0, π1, ~µ0, ~µ1, Σ) of the Gaussian mixture model.
(c) Assume again that Σ0 = Σ1 = Σ. Note that in this case, the decision boundary for the mixture model
reduces to a hyperplane; namely, we have P(y = 1|~x) = P(y = 0|~x) when ~w · ~x + b = 0. Let k 0.
Show that the set of points for which
P(y = 1|~x)
P(y = 0|~x)
= k
is also described by a hyperplane, and find the equation for this hyperplane. (These are the points
for which one class is precisely k times more likely than the other.) Of course, your answer should
recover the hyperplane decision boundary ~w · ~x + b = 0 when k = 1.
5

More products