$30
CS/ECE/ME 532
Homework 5
1. the face emotion classification problem from HW 2. Design and compare the performances
of the classifiers proposed in a and b, below. In each case, divide the dataset into 8 equal
sized subsets (e.g., examples 1 − 16, 17 − 32, etc). Use 6 sets of the data to estimate w
for each choice of the regularization parameter, select the best value for the regularization
parameter by estimating the error on one of the two remaining sets of data, and finally use
the w corresponding to the best value of the regularization parameter to predict the labels
of the remaining “hold-out” set. Compute the number of mistakes made on this hold-out set
and divide that number by 16 (the size of the set) to estimate the error rate. Repeat this
process 56 times (for the 8 × 7 different choices of the sets used to select the regularization
parameter and estimate the error rate) and average the error rates to obtain a final estimate.
a) Truncated SVD solution. Use the pseudo-inverse V Σ−1UT
, where Σ−1
is computed
by inverting the k largest singular values and setting others to zero. Here, k is the
regularization parameter and it takes values k = 1, 2, . . . , 9; i.e., compute 9 different
solutions, wbk.
b) Regularized LS. Let wbλ = arg maxx ky − Xwk
2
2 + λkwk
2
2
, for the following values of
the regularization parameter λ = 0, 2
−1
, 2
0
, 2
1
, 2
2
, 2
3
, and 24
. Show that wbλ can be
computed using the SVD and use this fact in your code.
c) Use the original dataset to generate 3 new features for each face, as follows. Take the
3 new features to be random linear combination of the original 9 features. This can be
done with the Matlab command X*randn(9,3) and augmenting the original matrix X
with the resulting 3 columns. Will these new features be helpful for classification? Why
or why not? Repeat the experiments in (a) and (b) above using the 12 features.
2. Many sensing and imaging systems produce signals that may be slightly distorted or blurred
(e.g., an out-of-focus camera). In such situations, algorithms are needed to deblur the data to
obtain a more accurate estimate of the true signal. The Matlab code blurring.m generates
a random signal and a blurred and noisy version of it, similar to the example shown below.
The code simulates this equation:
y = Xw + e ,
where y is the blurred and noisy signal, X is a matrix that performs the blurring operation,
w is the true signal, and e is a vector of errors/noise. The goal is to estimate w using y and
X.
a) Implement the standard LS, truncated SVD, and regularized LS methods for this problem.
b) Experiment with different averaging functions (i.e., different values of k in the code) and
with different noise levels (σ in the code). How do the blurring and noise level affect the
value of the regularization parameters that produce the best estimates?
1 of 2
3. Landweber convergence – REQUIRED FOR GRADUATE STUDENTS ONLY.
Consider the Landweber iteration for solving a standard least-squares problem with X ∈
R
m×n
, y ∈ R
m, and X has full column rank. Recall that this iteration begins with some
initial w0 and then:
wk+1 = wk − τX>(Xwk − y) for k = 0, 1, . . . (1)
a) We expect the algorithm to converge to w? = (X>X)
−1X>y. Define the error as
ek := wk − w?. Show how to rewrite (1) in the form ek+1 = P ek. What is the matrix
P ?
b) Define the residual rk := Xwk − y. Show how to rewrite (1) in the form rk+1 = Qrk.
What is the matrix Q?
c) Let {σi} be the singular values of X. Prove that when 0 < τ < 2
σ
2
1
, we have limk→∞ ek =
0. Hint: substitute the SVD of X into your expression for P .
d) Prove that if X is rank-deficient and w0 = 0, then the Landweber iteration converges
to the minimum norm solution. Hint: redo part a) using w? = X†y and see how this
affects part c).
2 of 2