$30
CS/ECE/ME 532
Homework 3
1. Orthogonal columns. Consider the matrix and vector
A =
3 1
0 3
0 4
and b =
1
3
1
.
a) By hand, find two orthonormal vectors that span the plane spanned by columns of A.
b) Make a sketch of these vectors and the columns of A in three dimensions.
c) Use these vectors to compute the LS estimate ˆb = A(AT A)
−1AT b.
2. Gram-Schmidt. Write your own code to perform Gram-Schmidt orthogonalization. Your
code should take as input a matrix A ∈ R
m×n and return as output a matrix U ∈ R
m×r
where U is orthogonal and has the same range as A. Note that r will indicate the rank of
A, so your code can also be used to find the rank of a matrix!
3. Design classifier to detect if a face image is happy.
Consider the two faces below. It is easy for a human, like yourself, to decide which is happy
and which is not. Can we get a machine to do it?
The key to this classification task is to find good features that may help to discriminate
between happy and mad faces. What features do we pay attention to? The eyes, the mouth,
maybe the brow?
The image below depicts a set of points or “landmarks” that can be automatically detected
in a face image (notice there are points corresponding to the eyes, the brows, the nose, and
1 of 2
the mouth). The distances between pairs of these points can indicate the facial expression,
such as a smile or a frown. We chose n = 9 of these distances as features for a classification
algorithm. The features extracted from m = 128 face images (like the two shown above)
are stored in the m × n matrix A in the Matlab file face_emotion_data.mat. This file also
includes an m × 1 binary vector b; happy faces are labeled +1 and mad faces are labeled −1.
The goal is to find a set of weights for the features in order to predict whether the emotion
of a face image is happy or mad.
a) Use the training data X and y to find an good set of weights.
b) How would you use these weights to classify a new face image as happy or mad?
c) Which features seem to be most important? Justify your answer.
d) Can you design a classifier based on just 3 of the 9 features? Which 3 would you choose?
How would you build a classifier?
e) A common method for estimating the performance of a classifier is cross-validation (CV).
CV works like this. Divide the dataset into 8 equal sized subsets (e.g., examples 1 − 16,
17 − 32, etc). Use 7 sets of the data to chose your weights, then use the weights to
predict the labels of the remaining “hold-out” set. Compute the number of mistakes
made on this hold-out set and divide that number by 16 (the size of the set) to estimate
the error rate. Repeat this process 8 times (for the 8 different choices of the hold-out
set) and average the error rates to obtain a final estimate.
f) What is the estimated error rate using all 9 features? What is it using the 3 features
you chose in (d) above?
2 of 2