$30
CS4442B & CS9542B: Artificial Intelligence II – Assignment #2
1 Generative Models, Naive Bayes Classifier [20 points]
Based on the Table in page 13 of Lecture 5, compute the followings
(a) [5 points] p(Water = cool|Play = yes), p(Water = cool|Play = no)
(b) [5 points] p(Play = yes|Water = warm), p(Play = no|Water = warm)
(c) [5 points] p(Play = yes|Humid = high), p(Play = yes|Humid = normal)
(d) [5 points] p(Water = cool|Play = yes), p(Water = cool|Play = no) with Laplace smoothing
2 Kernels [30 points]
In this problem, we consider constructing new kernels by combining existing kernels. Recall that for
some function k(x, z) to be a kernel, we need to be able to write it as a dot product of vectors in some
high-dimensional feature space defined by φ:
k(x, z) = φ(x)
φ(z)
Mercer’s theorem gives a necessary and sufficient condition for a function k to be a kernel function: its
corresponding kernel matrix K has to be symmetric and positive semidefinite.
Suppose that k1(x, z) and k2(x, z) are two valid kernels. For each of the cases below, state whether
k is also a valid kernel. If it is, prove it. If it is not, give a counterexample. You can use either Mercer’s
theorem, or the definition of a kernel as needed to prove it.
(a) [10 points] k(x, z) = k1(x, z)k2(x, z)
(b) [10 points] k(x, z) = f1(x)f1(z) + f2(x)f2(z), where f1, f2 : R
n → R are a real-valued functions
(c) [10 points] k(x, z) = √
k1(x,z)
k1(x,x)k1(z,z)
, where k1(x, x) 0 for any x.
3 PCA and Eigenface [50 points]
For this exercise, you will use principal component analysis (PCA) to analyze face images in any programming language of your choice (e.g., Python/Matlab/R). The data set faces.dat; each row represents an
image (400 images), and each column represents a pixel (64 × 64 = 4096 pixels).
(a) [5 points] Display the 100th image.
(b) [5 points] Remove the mean of the images, and then display the 100th image.
(c) [10 points] Perform PCA on the mean-centered data matrix. You can either implement PCA by
yourself using eigenvalue decomposition over the sample covariance matrix (e.g., the eig function
in Matlab), or use a existing machine learning toolbox (e.g., pca in Matlab). Sort the eigenvalues
in a descending order and plot them.
(d) [5 points] You will find the last (i.e., 400th) eigenvalue is 0. Explain why.
(e) [5 points] Based on the eigenvalues, determine the dimensionality of the data you want to keep
(i.e., how many principal components you want to keep), which accounts for most of the variance.
Explain your reason.
1
(f) [10 points] Display the top-5 leading eigenvectors (corresponding to the top-5 largest eigenvalues)
in 5 figures.
(g) [10 points] Display, respectively, the reconstructed 100th images using 10, 100, 200, and 399 principal components. (Hint: Lecture 10 (page 19), we have learned that ˆx = vvx if we project x into
1-dimensional space using the 1st principal component. Reconstructed ˆx using top-K principal
components is a straightforward extension: ˆx =
PK
k=1 vkv
k x)
Hint: in order to display the images and principal components (i.e., 3(a), 3(b), 3(f), 3(g)), you should
first reshape the 4096-dimensional vector into a 64 × 64 matrix (e.g., use reshape in Matlab). Then, in
Matlab, you can use the function imshow to display them.
2