$29.99
CSE512 – Machine Learning
Homework 6
This homework contains 2 questions. The last question requires programming. The maximum number
of points is 100 plus 20 bonus points.
1 PCA via Successive Deflation [30 points]
(Adapted from Murphy Exercise 12.7)
Suppose we have a set of n data points x1, . . . , xn, where each xi
is represented as a d-dimensional
column vector. Assume that the data has been centerlized, i.e., having zero mean: 1
n
Pn
i=1 xi = 0. Let
X = [x1; . . . ; xn] be the (d × n) matrix where column i is equal to xi
. Define C =
1
nXXT
to be the
covariance matrix of X, where cij =
1
n
Pn
l=1 xilxjl = covar(i, j).
Next, order the eigenvectors of C by their eigenvalues (largest first), and let v1, v2, . . . , vk be the first k
eigenvectors. These satisfy
v
T
i vj =
(
0 if i 6= j
1 if i = j
v1 is the first principal eigenvector of C (the eigenvector with the largest eigenvalue), and as such satisfies
Cv1 = λ1v1. Now define x˜i as the orthogonal projection of xi onto the space orthogonal to v1:
x˜i = (I − v1v
T
1
)xi
Finally, define X˜ = [x˜1; . . . ; x˜n] as the deflated matrix of rank d − 1, which is obtained by removing
from the d-dimensional data the component that lies in the direction of the first principal eigenvector:
X˜ = (I − v1v
T
1
)X
1. [7 points] Show that the covariance of the deflated matrix,
C˜ =
1
n
X˜ X˜ T
is given by
C˜ =
1
n
XXT − λ1v1v
T
1
(Hint: Some useful facts: (I − v1v
T
1
) is symmetric, XXT v1 = nλ1v1, and v
T
1 v1 = 1. Also, for any
matrices A and B, (AB)
T = BT AT
.)
2. [7 points] Show that for j 6= 1, if vj is a principal eigenvector of C with corresponding eigenvalue λj
(that is, Cvj = λjvj ), then vj is also a principal eigenvector of C˜ with the same eigenvalue λj .
3. [8 points] Let u be the first principal eigenvector of C˜ . Explain why u = v2. (You may assume u is
unit norm.)
4. [8 points] Suppose we have a simple method f for finding the leading eigenvector and eigenvalue of
a positive-definite matrix, denoted by [λ, u] = f(C). Write some pseudocode for finding the first k
principal basis vectors of X that only uses the special f function and simple vector arithmetic.
(Hint: This should be a simple iterative routine that takes only a few lines to write. The input is C, k,
and the function f, the output should be vj and λj for j ∈ 1, · · · , k)
1
2 Question 2 – Action recognition with CNN (70 points+20 bonus)
In this question, you will train a convolutional neural network (CNN) to classify images and videos using
Pytorch. We use the UCF101 data (see http://crcv.ucf.edu/data/UCF101.php). There are also 10 classes of
data in this homework but the data and the number of classes are different from those of Homework 4. Each
clip has 3 frames and each frame is 64 ∗ 64 pixels. The labels of train and validation clips are provided in
hw6 data.mat.
You will first train a CNN for action classification for each image. You will then improve the network
architecture and submit the classification results on the test data to Kaggle. Then, you will train a CNN
using 3D convolution for a set of video frames (rather than for individual frames), and submit your results to
Kaggle.
The detail instructions and questions are in the jupyter notebook Action CNN.ipynb. In this file, there
are 8 ‘ToDos’ spots for you to fill. The score of each ToDo is specified at the spot. For the 5
th and 8
th
ToDos, you need to submit CSV result files to Kaggle. The results would be evaluated by Categorization
Accuracy.For the 5
th ToDo, submit to https://www.kaggle.com/c/cse512f18hw6img. For the
8
th ToDo, submit to https://www.kaggle.com/c/cse512f18hw6vid.
We will maintain a leader board for each Kaggle competition, and the top three entries at the end of
the competition (official assignment due date) will receive 10 bonus points. Any submission that rises to
top three after the assignment deadline is not eligible for bonus points. The ranking will be based on the
Categorization Accuracy. Marks for these questions will be scaled according to the ranking on the Private
Leaderboard. To prevent exploiting test data, you are allowed to make a maximum of 2 submissions per 24
hours. Your submission will be evaluated immediately and the leader board will be updated.
Environment setting
Please make a ./data folder under the same directory with the Action CNN.ipynb file. Put data ./trainClips,
./valClips, ./testClips and hw6 data.mat under ./data.
We recommend using virtual environment for the project. If you choose not to use a virtual environment,
it is up to you to make sure that all dependencies for the code are installed globally on your machine. To set
up a virtual environment, run the following in the command-line interface:
cd your_hw6_folder
sudo pip install virtualenv # This may already be installed
virtualenv .env # Create a virtual environment
source .env/bin/activate # Activate the virtual environment
pip install -r requirements.txt # Install dependencies
# Note that this does NOT install TensorFlow or PyTorch,
# which you need to do yourself.
# Work (hard) on the assignment
# ... and when you’re done:
deactivate # Exit the virtual environment
Note that every time you want to work on the assignment, you should run ‘source .env/bin/activate’ (from
within your hw6 folder) to re-activate the virtual environment, and deactivate again whenever you are done.
3 What to submit?
3.1 Blackboard submission
You will need to submit both your code and your answers to questions on Blackboard. Put the answer file and
your code in a folder named: SBUID FirstName LastName (e.g., 10947XXXX lionel messi). Zip this folder
and submit the zip file on Blackboard. Your submission must be a zip file, i.e, SBUID FirstName LastName.zip.
2
The answer file should be named: answers.pdf. The first page of the answers.pdf should be the filled cover
page at the end of this homework. The remaining of the answer file should contain:
1. Answers (and derivations) to Question 1
You can use Latex if you wish, but it is not compulsory.
3.2 Kaggle submission
For Question 2, you must submit a CSV file to get the accuracy from the competition sites, mentioned above,
A submission file should contain two columns: ID and Class. The file should contain a header and have the
following format.
Id, Class
42, 2
43, 5
... ...
Two sample submission files are available from the competition site and our handout.
4 Cheating warnings
Don’t cheat. You must do the homework yourself, otherwise you won’t learn. You cannot ask and discuss
with students from previous years. You cannot look up the solution online.
3
Cover page for answers.pdf
CSE512 Fall 2018 - Machine Learning - Homework 6
Your Name:
Solar ID:
NetID email address:
Names of people whom you discussed the homework with: