$30
COMS 4771 HW 5
This homework is to be done alone. No late homeworks are allowed. To receive credit, a typesetted copy of the homework pdf must be uploaded to Gradescope by the due date. You must show
your work to receive full credit. Discussing possible solutions for homework questions is encouraged on piazza and with your peers, but you must write their own individual solutions. You should
cite all resources (including online material, books, articles, help taken from specific individuals,
etc.) you used to complete your work.
1 Improving the confidence
Given a collection of models F, suppose you were able to develop an algorithm A : (xi
, yi)
n
i=1 7→
f
A
n
(that is, given n labeled training samples, A returns a model f
A
n ∈ F) that has the following
property: for all ? 0, with probability 0.55 (over the draw of n = O(
1
?
2 ) samples,
err(f
A
n
) − inf
f∈F
err(f) ≤ ?,
where err(f) := P(x,y)
[f(x) 6= y].
Show that one can construct an algorithm B : (xi
, yi)
n
0
i=1 7→ f
B
n0 with the property: for all ? 0
and all δ 0, with probability at least 1 − δ over a draw of n
0
samples:
err(f
B
n0 ) − inf
f∈F
err(f) ≤ ?.
Moreover show that n
0 = poly(
1
?
,
1
δ
) samples are enough for the algorithm B to return such a model
f
B
n0 . Hence, the model class F is efficiently PAC-learnable.
Hint: Algorithm B can make multiple calls to the algorithm A.)
2 Non-linear Dimensionality Reduction
Here is a simple way to accomplish non-linear dimensionality reduction:
Input: High-dimensional dataset X = x1, . . . , xn ∈ R
D, target dimension d
Output: y1, . . . , yn ∈ R
d
as the low-dimensional mapping of the given dataset
- Construct a k-nearest neighbor graph1 G on X
1A k-nearest neighbor graph is simply a graph where the nodes correspond to the datapoints, and edges correspond to the
Euclidean distance between the corresponding datapoints. Important: For each node/datapoint, only the k closest nodes are
connected, with edge weight being the Euclidean distance between the nodes.
1
- Let πij denote the shortest path between datapoints xi and xj according to G.
- Select y1, . . . , yn ∈ R
d
according to the following minimization problem
minimizey1,...,yn
X
i,j
kyi − yjk − πij ?2
(i) What is the derivative of the optimization function above with respect to a fixed yi?
(ii) Is the optimization above convex with respect a fixed yi? Why or why not.
(iii) Write a program in your preferred language to find a low-dimension embedding of any given
input dataset. You must submit your code to Courseworks to receive full credit.
(iv) For the two datasets provided, compute your two dimensional embedding. Plot the original
3D data, its 2D PCA projection and the results obtained from your 2D embedding.
Analyze the quality of results you obtain. Under what circumstances would
• this non-linear embedding fail/succeed?
• PCA will perform better/worse than this non-linear embedding?