Starting from:

$30

Homework 1. PCA and MDS

Topological and Geometric Data Analysis 
Homework 1. PCA and MDS

The problem below marked by ∗
is optional with bonus credits. For the experimental problem,
include the source codes which are runnable under standard settings. Since there is NO TA assigned for this class, homework will not be graded. But if you would like to submit your exercise,
please send your homework to the address (datascience.hw@gmail.com) with a title “CSIC5011:
Homework #”. I’ll read them and give you bonus credits.
1. PCA experiments: Take any digit data ( ‘0’,...,‘9’), or all of them, from website
http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/zip.digits/
and perform PCA experiments with Matlab or other languages (e.g. ipynb) you are familiar:
(a) Set up data matrix X = (x1, . . . , xn) ∈ R
p×n
;
(b) Compute the sample mean ˆµn and form X˜ = X − eµˆ
T
n
;
(c) Compute top k SVD of X˜ = USkV
T
;
(d) Plot eigenvalue curve, i.e. i vs. λi(Σˆ
n)/tr(Σˆ
n) (i = 1, . . . , k), with top-k eigenvalue λi
for sample covariance matrix Σˆ
n =
1
nX˜ ∗ X˜ T
, which gives you explained variation of
data by principal components;
(e) Use imshow to visualize the mean and top-k principle components as left singular vectors
U = [u1, . . . , uk];
(f) For k = 1, order the images (xi) (i = 1, . . . , n) according to the top first right singular
vector, v1, in an ascending order of v1(i);
(g) For k = 2, scatter plot (v1, v2) and select a grid on such a plane to show those images
on the grid (e.g. Figure 14.23 in book [ESL]: Elements of Statistical Learning).
(h)* You may try the parallel analysis with permutation test to see how many significant
principle components you will obtain.
2. MDS of cities: Go to the following website
http://www.geobytes.com/citydistancetool.htm
Perform the following experiment.
(a) Input a few cities (no less than 7) in your favorite, and collect the pairwise air traveling
distances shown on the website in to a matrix D;
(b) Make your own codes of Multidimensional Scaling algorithm for D;
(c) Plot the normalized eigenvalues λi/(
P
i
λi) in a descending order of magnitudes, analyze
your observations (did you see any negative eigenvalues? if yes, why?);
1
Homework 1. PCA and MDS 2
(d) Make a scatter plot of those cities using top 2 or 3 eigenvectors, and analyze your
observations.
3. Positive Semi-definiteness: Recall that a n-by-n real symmetric matrix K is called positive
semi-definite (p.s.d. or K  0) iff for every x ∈ R
n
, x
T Kx ≥ 0.
(a) Show that K  0 if and only if its eigenvalues are all nonnegative.
(b) Show that dij = Kii +Kjj −2Kij is a squared distance function, i.e. there exists vectors
ui
, vj ∈ R
n
(1 ≤ i, j ≤ n) such that dij = kui − ujk
2
.
(c) Let α ∈ R
n be a signed measure s.t. P
i αi = 1 (or e
Tα = 1) and Hα = I − eαT be the
Householder centering matrix. Show that Bα = −
1
2HαDHT
α  0 for matrix D = [dij ].
(d) If A  0 and B  0 (A, B ∈ R
n×n
), show that A + B = [Aij + Bij ]ij  0 (elementwise
sum), and A ◦ B = [AijBij ]ij  0 (Hadamard product or elementwise product).
4. Distance: Suppose that d : R
p × R
p → R is a distance function.
(a) Is d
2 a distance function? Prove or give a counter example.
(b) Is √
d a distance function? Prove or give a counter example.
5. ∗Singular Value Decomposition: The goal of this exercise is to refresh your memory about
the singular value decomposition and matrix norms. A good reference to the singular value
decomposition is Chapter 2 in this book:
Matrix Computations, Golub and Van Loan, 3rd edition.
(a) Existence: Prove the existence of the singular value decomposition. That is, show that if
A is an m×n real valued matrix, then A = UΣV
T
, where U is m×m orthogonal matrix,
V is n × n orthogonal matrix, and Σ = diag(σ1, σ2, . . . , σp) (where p = min{m, n}) is an
m × n diagonal matrix. It is customary to order the singular values in decreasing order:
σ1 ≥ σ2 ≥ . . . ≥ σp ≥ 0. Determine to what extent the SVD is unique. (See Theorem
2.5.2, page 70 in Golub and Van Loan).
(b) Best rank-k approximation - operator norm: Prove that the “best” rank-k approximation
of a matrix in the operator norm sense is given by its SVD. That is, if A = UΣV
T
is the
SVD of A, then Ak = UΣkV
T
(where Σk = diag(σ1, σ2, . . . , σk, 0, . . . , 0) is a diagonal
matrix containing the largest k singular values) is a rank-k matrix that satisfies
kA − Akk = min
rank(B)=k
kA − Bk.
(Recall that the operator norm of A is kAk = maxkxk=1 kAxk. See Theorem 2.5.3 (page
72) in Golub and Van Loan).
(c) Best rank-k approximation - Frobenius norm: Show that the SVD also provides the best
rank-k approximation for the Frobenius norm, that is, Ak = UΣkV
T
satisfies
kA − AkkF = min
rank(B)=k
kA − BkF .
Homework 1. PCA and MDS 3
(d) Schatten p-norms: A matrix norm k · k that satisfies
kQAZk = kAk,
for all Q and Z orthogonal matrices is called a unitarily invariant norm. The Schatten
p-norm of a matrix A is given by the `p norm (p ≥ 1) of its vector of singular values,
namely,
kAkp =
 X
i
σ
p
i
!1/p
.
Show that the Schatten p-norm is unitarily invariant. Note that the case p = 1 is
sometimes called the nuclear norm of the matrix, the case p = 2 is the Frobenius norm,
and p = ∞ is the operator norm.
(e) Best rank-k approximation for unitarily invariant norms: Show that the SVD provides
the best rank-k approximation for any unitarily invariant norm. See also 7.4.51 and
7.4.52 in:
Matrix Analysis, Horn and Johnson, Cambridge University Press, 1985.
(f) Closest rotation: Given a square n × n matrix A whose SVD is A = UΣV
T
, show that
its closest (in the Frobenius norm) orthogonal matrix R (satisfying RRT = RT R = I)
is given by R = UV T
. That is, show that
kA − UV T
kF = min
RRT =RT R=I
kA − RkF ,
where A = UΣV
T
. In other words, R is obtained from the SVD of A by dropping the
diagonal matrix Σ. Use this observation to conclude what is the optimal rotation that
aligns two sets of points p1, p2, . . . , pn and q1, . . . , qn in R
d
P
, that is, find R that minimizes
n
i=1 kRpi − qik
2
. See also (the papers are posted on course website):
• [Arun87] Arun, K. S., Huang, T. S., and Blostein, S. D., “Least-squares fitting of two
3-D point sets”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 9
(5), pp. 698–700, 1987.
• [Keller75] Keller, J. B., “Closest Unitary, Orthogonal and Hermitian Operators to a
Given Operator”, Mathematics Magazine, 48 (4), pp. 192–197, 1975.
• [FanHoffman55] Fan, K. and Hoffman, A. J., “Some Metric Inequalities in the Space of
Matrices”, Proceedings of the American Mathematical Society, 6 (1), pp. 111–116, 1955.

More products