$30
Homework Three CSE 250B
Your homework must be typeset, and uploaded in PDF format to Gradescope by midnight on
the due date.
1. Bivariate Gaussians. Each of the following scenarios describes a joint distribution (x, y). In each case,
give the parameters of the (unique) bivariate Gaussian that satisfies these properties.
(a) x has mean 2 and standard deviation 1, y has mean 2 and standard deviation 0.5, and the
correlation between x and y is −0.5.
(b) x has mean 1 and standard deviation 1, and y is equal to x.
2. More bivariate Gaussians.
(a) Plot 100 random samples from the Gaussian N(µ, Σ), with
µ =
?
0
0
?
, Σ = ?
9 0
0 1 ?
.
(b) Repeat for
µ =
?
0
0
?
, Σ = ?
1 −0.75
−0.75 1 ?
.
3. Linear classification. Consider the linear classifier w · x ≥ θ, where
w =
?
−3
4
?
and θ = 12.
Plot the decision boundary in R
2
. Make sure to label precisely where the boundary intersects the
coordinate axes, and also indicate which side of the boundary is the positive side.
4. Eigendecomposition of a covariance matrix. Let X ∈ R
p be a random variable, and let Σ denote its
covariance matrix:
Σ = E[(X − EX)(X − EX)
T
].
Suppose Σ has eigenvalues λ1, . . . , λp and corresponding eigenvectors u1, . . . , up.
(a) We can tell whether Σ is invertible simply by inspecting the λi
’s. Explain how.
(b) Let c 0 be any constant. What are the eigenvalues and eigenvectors of Σ + cI?
(c) Assuming Σ is invertible, what are the eigenvalues and eigenvectors of Σ−1
?
5. Handwritten digit recognition using a Gaussian generative model. Recall the MNIST data set from the
first homework. In this problem, you will build a classifier for this data, by modeling each class as a
multivariate (784-dimensional) Gaussian:
• Fit a Gaussian to each digit. You will find it helpful to smooth the covariance by adding in cI,
where c is some constant and I is the identity matrix. Use a validation set to help you choose a
good setting of c.
• Use Bayes’ rule to classify new points.
Turn in the following:
(a) Pseudocode for your training procedure, making it very clear: (i) how the validation set was
created and used, and (ii) what specific prediction rule was used.
(b) Error rate on the MNIST test set.
1
(c) Out of the misclassified test digits, pick five at random and display them. For each instance, list
the posterior probabilities Pr(y|x) of each of the ten classes.
6. A classifier for MNIST that occasionally abstains. This is a continuation of the previous problem.
Suppose you are given some f ∈ [0, 1] and you are allowed to abstain on a fraction f of instances –
that is, when asked to make a prediction, you can say “don’t know” a fraction f of the time. You
would like it to be the case that when you do make a prediction, the error rate is lower than what you
obtained earlier. You should think of f as an approximate guideline, not a rigid target.
Fix the parameters (Gaussians and class weights) from problem 1, but modify the prediction rule to
incorporate occasional abstaining. Depending on your strategy, you might need to use your earlier
validation set to choose appropriate thresholds.
(a) Give a description of your strategy that includes the precise equations you used for prediction.
(b) Give pseudocode showing how any parameters are chosen. Note: these should not depend on the
test set in any way.
(c) Plot two curves: (i) test error rate (on points actually classified) versus f, and (ii) fraction of
data on which classifier abstains versus f. Your graph should include at least the specific values
f = 0.05, 0.1, 0.15, 0.2.
2