$30
EECS E6720: Bayesian Models for Machine Learning
Homework 2
Please read these instructions to ensure you receive full credit on your homework.
Submit the written portion of your homework as a single PDF file through Courseworks (less
than 5MB). In addition to your PDF write-up, submit all code written by you in their original
extensions through Courseworks (e.g., .m, .r, .py, etc.). Any coding language is acceptable, but
your code should be your own. Do not submit Jupyter or other notebooks, but the original source
code only. Do not wrap your files in .rar, .zip, .tar and do not submit your write-up in .doc or
other file type. Your grade will be based on the contents of one PDF file and the original source
code. Additional files will be ignored. We will not run your code, so everything you are asked to
show should be put in the PDF file. Show all work for full credit.
Late submission policy: Late homeworks will have 0.1% deducted from the final grade for
each minute late. Your homework submission time will be based on the time of your last submission to Courseworks. Therefore, do not re-submit after midnight on the due date unless you are
confident the new submission is significantly better to overcompensate for the points lost. You
can resubmit as much as you like, but each time you resubmit be sure to upload all files you want
graded! Submission time is non-negotiable and will be based on the time you submitted your last
file to Courseworks. The number of points deducted will be rounded to the nearest integer.
Problem 1. (40 points)
In this homework, you will implement an EM algorithm for the object recommendation problem
discussed in class. Here, we have a data set of the form R = {rij} restricted to a subset of pairs
(i, j) ∈ Ω, where i can range from 1, . . . , N and j can range from 1, . . . , M and we let rij ∈ {±1}.
The goal of matrix factorization is to find a low rank approximation of this data.
To this end, let the unknown model variables be ui ∈ R
d and vj ∈ R
d
for i = 1, . . . , N and
j = 1, . . . , M. Let U = {ui} and V = {vj}. The model and priors are,
rij |U, V ind ∼ Bernoulli(Φ(u
T
i
vj/σ)), for all (i, j) ∈ Ω, ui
iid∼ N(0, cI), vj
iid∼ N(0, cI).
The function Φ(·) is the CDF of a standard normal random variable, as discussed in class. (Hint:
You will find the probit regression discussion useful for this homework.)
The goal of this homework will be to derive and implement an EM algorithm for maximizing
ln p(R, U, V ) = Z
q(φ) ln p(R, U, V, φ)
q(φ)
dφ
| {z }
L(U,V )
+
Z
q(φ) ln q(φ)
p(φ|R, U, V )
dφ,
where the auxiliary variable φ is used in the probit setup similar to that discussed in class for
probit regression: φ = {φij} for (i, j) ∈ Ω and rij = sign(φij ), φij ∼ Normal(u
T
i
vj , σ2
).1
1Note: In the data, rij ∈ {−1, +1} instead of {0, 1}. This won’t impact the final derivation.
1
Problem 1 will focus on deriving the algorithm. To this end:
a) Derive q(φ) for the EM algorithm. Hint: You will be able to show q(φ) = Q
(i,j)∈Ω
q(φij ).
b) Derive L(U, V ). You can use the Eq[·] notation and write the relevant expectation separately.
c) Derive the U, V that optimize L(U, V ). Hint: Derive for one (and therefore all) ui and vj .
d) Summarize the final EM algorithm in a list of steps that are easy to read, yet detailed
enough for someone to be able to implement it. (Be sure to write out ln p(R, U, V ).)
Problem 2. (35 points)
In this problem, you will implement the EM algorithm you derived in Problem 1 for finding a local
optimal solution to ln p(R, U, V ). The data is provided on Courseworks along with a README
file explaining the data. For your implementation, set d = 5, c = 1 and σ
2 = 1. Treat the users
as U and the movies as V when implementing the algorithm.2
You will need to initialize your algorithm in some way. One possibility is to start by generating
each ui and vj from Normal(0, 0.1I). Please use this method for this assignment.
a) Run your algorithm for 100 iterations and plot ln p(R, U, V ) for iterations 2 through 100.
b) Rerun your algorithm for 100 iterations using 5 different random starting points. Plot the
5 different objective functions for iterations 20 through 100. Note: This is simply a repeat
of Problem 2(a), only showing 5 objective functions instead of one and changing the x-axis.
c) Predict the values given in the test set and show your results in a confusion matrix. Show
the raw counts in this confusion matrix.
2Hint: Depending how you implement it, your code may be very fast or very slow. You may want to change
the data format so you aren’t looping over the list of ratings. Also, the EM algorithm will work if you change
around the orders: e.g., update q(φ), then update U, then update q(φ), then update V . We didn’t present EM
this way—we presented it as doing “E” for everything then “M” for everything in one iteration—but it may help
develop your understanding of EM if you think through why this suggestion also works. Your code will work if you
ignore this hint, but it may end up being slow, and this hint is probably not the only way to faster code.
2