$35
CSC2515
Homework 6
Submission: You need to submit your answers to all 3 questions through MarkUs1 as a PDF file
titled hw5_writeup.pdf. You can produce the file however you like (e.g. LATEX, Microsoft Word,
scanner), as long as it is readable.
Late Submission: 10% of the marks will be deducted for each day late, up to a maximum of 3
days. After that, no submissions will be accepted.
Collaboration: Homeworks are individual work. See the course web page for detailed policies.
1. [5 points] EM for Probabilistic PCA. In lecture, we covered the EM algorithm applied
to mixture of Gaussians models. In this question, we’ll look at another interesting example
of EM but where the latent variables are continuous: probabilistic PCA. This is a model
very similar in spirit to PCA: we have data in a high-dimensional space, and we’d like to
summarize it with a lower-dimensional representation. Unlike ordinary PCA, we formulate
the problem in terms of a probabilistic model. We assume the latent code vector z is drawn
from a standard Gaussian distribution N (0, I), and that the observations are drawn from a
spherical Gaussian whose mean is a linear function of z. We’ll consider the slightly simplified
case of scalar-valued z (i.e. only one principal component). The probabilistic model is given
by:
z ∼ N (0, 1)
x | z ∼ N (zu, σ2
I),
where σ
2
is the noise variance (which we assume to be fixed) and u is a parameter vector (which, intuitively, should correspond to the top principal component). Note that the
observation model can be written in terms of coordinates:
xj | z ∼ N (zuj , σ2
).
We have a set of observations {x
(i)}
N
i=1, and z is a latent variable, analogous to the mixture
component in a mixture-of-Gaussians model.
In this question, you’ll derive both the E-step and the M-step for the EM algorithm.
(a) E-step (2 points). In this step, your job is to calculate the statistics of the posterior
distribution q(z) = p(z | x) which you’ll need for the M-step. In particular, your job is
to find formulas for the (univariate) statistics:
m = E[z | x] =
s = E[z
2
| x] =
Tips:
• First determine the conditional distribution p(z | x) using the Gaussian conditioning
formulas from the Appendix.. To help you check your work: p(z | x) is a univariate
Gaussian distribution whose mean is a linear function of x, and whose variance does
not depend on x.
1
https://markus.teach.cs.toronto.edu/csc2515-2019-09
1
CSC2515 Fall 2019 Homework 5
• Once you’ve determined the conditional distribution (and hence the posterior mean
and variance), use the fact that Var(Y ) = E[Y
2
]−E[Y ]
2
for any random variable Y .
(b) M-step (3 points). In this step, we need to re-estimate the parameters, which consist
of the vector u. (Recall that we’re treating σ as fixed.) Your job is to derive a formula
for unew that maximizes the expected log-likelihood, i.e.,
unew ← arg max
u
1
N
X
N
i=1
Eq(z
(i))
[log p(z
(i)
, x
(i)
)].
(Recall that q(z) is the distribution computed in part (a).) This is the new estimate
obtained by the EM procedure, and will be used again in the next iteration of the E-step.
Your answer should be given in terms of the m(i) and s
(i)
from the previous part. (I.e.,
you don’t need to expand out the formulas for m(i) and s
(i)
in this step, because if you
were implementing this algorithm, you’d use the values m(i) and s
(i)
that you previously
computed.)
Tips:
• First expand out log p(z
(i)
, x
(i)
). You’ll find that a lot of the terms don’t depend on
u and can therefore be dropped.
• Apply linearity of expectation. You should wind up with terms proportional to
Eq(z
(i))
[z
(i)
] and Eq(z
(i) [[z
(i)
]
2
]. Replace these expectations with m(i) and s
(i)
. You
should get an equation that does not mention z
(i)
. (If you don’t wind up with terms
of this form, then see if there’s some way you can simplify log p(z
(i)
, x
(i)
).
• In order to find the maximum likelihood parameter unew, you need to determine the
gradient with respect to u, set it to zero, and solve for unew.
2. [2 points] Contraction Maps. In lecture, we showed that the optimal Bellman backup
operator is a contraction map, and hence that value iteration converges to the optimal Qfunction Q∗
. Now consider the problem of policy evaluation, i.e. finding the Q-function
Qπ
for a given (stochastic) policy π. Since Qπ
is characterized by the fixed-point equation
T
πQπ = Qπ
, we can repeatedly apply the update
Qk+1 ← T
πQk,
which can be written out in full as:
Qk+1(s, a) ← r(s, a) + γ
X
s
0
P(s
0
| a, s)
X
a
0
π(a
0
| s
0
) Qk(s
0
, a0
).
Show that the Bellman backup operator T
π
is a contraction map in the k · k∞ norm. Your
proof will probably look very similar to the one from Slide 30 of Lecture 10, but be sure to
justify each step.
3. [3 points] Q-Learning. In lecture, we made the claim that Q-learning only converges to
the optimal Q-function if the agent follows an exploration-encouraging strategy such as εgreedy. Your job is to give a counterexample to show that exploration is necessary. I.e., you
will show that Q-learning might get stuck with a suboptimal Q-function if it always chooses
π(s) = arg maxa Q(s, a).
2
CSC2515 Fall 2019 Homework 5
Consider an MDP with two states s1 and s2, and two actions, Stay and Switch. The environment is deterministic. If the agent chooses Stay, then it stays in the current state
(i.e. St+1 = St), while if it chooses Switch, it switches to the other state (i.e., if it’s in s1, it
transitions to s2, and vice versa). The reward function is given by:
r(S, A) = (
1 if S = s1
2 if S = s2
The discount factor is γ = 0.9.
(a) (1 point) Determine the optimal policy and the Q-function for the optimal policy. You
should give the Q-function as a table. You don’t need to show your work or justify your
answer for this part.
(b) (2 points) Now suppose we apply Q-learning, except that instead of the ε-greedy policy, the agent follows the greedy policy which always chooses π(s) = arg maxa Q(s, a).
Assume the agent starts in state S0 = s1. Give an example of a Q-function that is in
equilibrium (i.e. it will never change after the Q-learning update rule is applied), but
which results in a suboptimal policy. (You should specify the Q-function as a table.)
Justify your answer.
3
CSC2515 Fall 2019 Homework 5
Appendix: Some Properties of Gaussians
Consider a multivariate Gaussian random variable z with the mean µ and the covariance matrix
Σ. I.e.,
p(z) = N (z | µ, Σ).
Now consider another Gaussian random variable x, whose mean is an affine function of z (in
the form to be clear soon), and its covariance S is independent of z. The conditional distribution
of x given z is
p(x | z) = N (x | Az + b, S).
Here the matrix A and the vector b are of appropriate dimensions.
In some problems, we are interested in knowing the distribution of z given x, or the marginal
distribution of x. One can apply Bayes’ rule to find the conditional distribution p(z | x). After
some calculations, we can obtain the following useful formulae:
p(x) = N
x | Aµ + b, AΣA> + S
p(z | x) = N
z | C(A>S
−1
(x − b) + Σ−1µ), C
with
C = (Σ−1 + A>S
−1A)
−1
.
You may also find it helpful to read Section 2.3 of Bishop.
4