Starting from:

$30

Bayesian Models for Machine Learning Homework 1

EECS E6720: Bayesian Models for Machine Learning

Homework 1
Please read these instructions to ensure you receive full credit on your homework.
Submit the written portion of your homework as a single PDF file through Courseworks (less
than 5MB). In addition to your PDF write-up, submit all code written by you in their original
extensions through Courseworks (e.g., .m, .r, .py, etc.). Any coding language is acceptable, but
your code should be your own. Do not submit Jupyter or other notebooks, but the original source
code only. Do not wrap your files in .rar, .zip, .tar and do not submit your write-up in .doc or
other file type. Your grade will be based on the contents of one PDF file and the original source
code. Additional files will be ignored. We will not run your code, so everything you are asked to
show should be put in the PDF file. Show all work for full credit.
Late submission policy: Late homeworks will have 0.1% deducted from the final grade for
each minute late. Your homework submission time will be based on the time of your last submission to Courseworks. Therefore, do not re-submit after midnight on the due date unless you are
confident the new submission is significantly better to overcompensate for the points lost. You
can resubmit as much as you like, but each time you resubmit be sure to upload all files you want
graded! Submission time is non-negotiable and will be based on the time you submitted your last
file to Courseworks. The number of points deducted will be rounded to the nearest integer.
Problem 1. (10 points)
Your friend is on a gameshow and phones you for advice. She describes her situation as follows:
There are three doors with a prize behind one of the doors and nothing behind the other two.
She randomly picks one of the doors, but before opening it, the gameshow host opens one of
the other two doors to show that it contains no prize. She wants to know whether she should
stay with her original selection or switch doors. What is your suggestion? Calculate the relevant
posterior probabilities to convince her that she should follow your advice.
Problem 2. (15 points)
Let π = (π1, . . . , πK), with πj ≥ 0,
P
j
πj = 1. Let Xi ∼ Multinomial(π), i.i.d. for i = 1, . . . , N.
Find a conjugate prior for π and calculate its posterior distribution and identify it by name.
What is the most obvious feature about the parameters of this posterior distribution?
Problem 3. (30 points)
You are given a dataset {x1, . . . , xN }, where each x ∈ N. You model it as i.i.d. Poisson(λ). Since
you don’t know λ, you model it as λ ∼ Gamma(a, b).
a) Using Bayes rule, calculate the posterior of λ and identify the distribution.
b) Using the posterior, calculate the predictive distribution on a new observation,
p(x

|x1, . . . , xn) = Z ∞
0
p(x

|λ)p(λ|x1, . . . , xN )dλ
1
Problem 4. (20 points)
In this problem you will use your derivations from Problem 3 to code a naive Bayes classifier for
distinguishing spam from non-spam emails. The data is provided on Courseworks.
Each 54-dimensional vector x has a label y with y = 0 indicating “non-spam” and y = 1 indicating
“spam”. We model the nth feature vector of a spam email as
p(xn|
~λ1, yn = 1) = Y
54
d=1
Poisson(xn,d|λ1,d),
and similarly for class 0. We model the labels as yn ∼ Bernoulli(π). Assume independent gamma
priors on all λ1,d and λ0,d, as in Problem 3, with a = 1 and b = 1. For the label bias assume the
prior π ∼ Beta(e, f) and set e = f = 1.
Let (x

, y∗
) be a new test pair. The goal is to predict y
∗ given x

. To do this we use the
predictive distribution under the posterior of the naive Bayes classifier. That is, for possible label
y
∗ = y ∈ {0, 1} we compute
p(y
∗ = y|x

, X, ~y) ∝ p(x

|y
∗ = y, {xi
: yi = y})p(y
∗ = y|~y)
where X and ~y contain N training pairs of the form (xi
, yi). This can be calculated as follows:
p(x

|y
∗ = y, {xi
: yi = y}) = Y
54
d=1
Z ∞
0
p(x

|λy,d)p(λy,d|{xi
: yi = y})dλ
The results from Problem 3 can be directly applied here. Also, as discussed in the notes
p(y
∗ = y|~y) = Z 1
0
p(y
∗ = y|π)p(π|~y)dπ
which has the solutions p(y
∗ = 1|~y) = e +
P
n 1(yn = 1)
N + e + f
and p(y
∗ = 0|~y) = f +
P
n 1(yn = 0)
N + e + f
.
a) Using the marginal distributions discussed above, implement this naive Bayes classifier for
binary classification in your preferred language.
b) Make predictions for all data in the testing set by assigning the most probable label to each
feature vector. In a 2 × 2 table, list the total number of spam classified as spam, non-spam
classified as non-spam, as well as the off-diagonal values (i.e., a confusion matrix). Use the
provided ground truth for this evaluation.
c) Pick three misclassified emails and for each email plot its features x compared with E[
~λ1]
and E[
~λ0], and give the predictive probabilities for that email. Mark the 54 points along
the x-axis with their names in the readme file.
d) Pick the three most ambiguous predictions, i.e., the digits whose predictive probabilities
are the closest to 0.5. Show the same information for these three emails that you showed
in Problem 4(c) above.
2

More products