$30
HOMEWORK 7
>>NAME HERE<<
>>ID HERE<<
Instructions: You can choose any programming language as long as you implement the algorithm from scratch.
Use this latex file as a template to develop your homework. Submit your homework on time as a single pdf file to
Canvas. Please check Piazza for updates about the homework.
1 Principal Component Analysis [60 pts]
Download three.txt and eight.txt. Each has 200 handwritten digits. Each line is for a digit, vectorized
from a 16 × 16 grayscale image.
1. (10 pts) Each line has 256 numbers: They are pixel values (0=black, 255=white) vectorized from the image as the first column (top-down), the second column, and so on. Visualize the two grayscale images
corresponding to the first line in three.txt and the first line in eight.txt.
2. (10 pts) Putting the two data files together (three first, eight next) to form an n × D matrix X where
n = 400 digits and D = 256 pixels. Note we use n × D size for X instead of D × n to be consistent
with the convention in linear regression. The i-th row of X is x
⊤
i
, where xi ∈ R
D is the i-th image in the
combined data set. Compute the sample mean y =
1
n
Pn
i=1 xi
. Visualize y as a 16 × 16 grayscale image.
3. (10 pts) Center X using y above. Then form the sample covariance matrix S =
X⊤X
n−1
Show the 5 × 5
submatrix S(1 . . . 5, 1 . . . 5).
4. (10 pts) Use appropriate software to compute the two largest eigenvalues λ1 ≥ λ2 and the corresponding
eigenvectors v
1
, v
2 of S. For example, in Matlab one can use eigs(S,2). Show the value of λ1, λ2.
Visualize v
1
, v
2
as two 16 × 16 grayscale images. Hint: Their elements will not be in [0, 255], but you
can shift and scale them appropriately. It is best if you can show an accompanying ’colorbar’ that maps the
grayscale to values.
5. (10 pts) Now we project (the centered) X down to the two PCA directions. Let V = [v
1 v
2
] be the D × 2
matrix. The projection is simply XV. Show the resulting two coordinates for the first line in three.txt
and the first line in eight.txt, respectively.
6. (10 pts) Now plot the 2D point cloud of the 400 digits after projection. For visual interest, color the points
in three.txt red and the points in eight.txt blue. But keep in mind that PCA is an unsupervised
learning method, and it does not know such class labels.
2 Directed Graphical Model [20 points]
Consider the directed graphical model (aka Bayesian network) in Figure 1.
1
Homework 6 CS 760 Machine Learning
Figure 1: A Bayesian Network example.
Compute P(B = t | E = f, J = t, M = t) and P(B = t | E = t, J = t, M = t). These are the conditional
probabilities of a burglar in your house (yikes!) when both of your neighbors John and Mary call you and say
they hear an alarm in your house, but without or with an earthquake also going on in that area (what a busy day),
respectively.
2