Starting from:

$35

Machine Learning Foundations Homework #3

Machine Learning Foundations
Homework #3

Unless granted by the instructor in advance, you must turn in a printed/written copy of your solutions
(without the source code) for all problems.
For problems marked with (*), please follow the guidelines on the course website and upload your source
code to designated places. You are encouraged to (but not required to) include a README to help the
TAs check your source code. Any programming language/platform is allowed.

Discussions on course materials and homework solutions are encouraged. But you should write the final
solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but
not copied from.
Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework
solutions and/or source codes to your classmates at any time. 
You should write your solutions in English or Chinese with the common math notations introduced in
class or in the problems. We do not accept solutions written in any other languages.
This homework set comes with 200 points and 20 bonus points. In general, every homework set would come with a full credit of 200 points, with some possible bonus points.
1. (80 points) Go register for the Coursera version of the second part of the class ( https://www.
coursera.org/teach/ntumlone-algorithmicfoundations/ ) and solve its homework 3. The
registration should be totally free. Then, record the highest score that you get within up to 3
trials. Please print out a snapshot of your score as an evidence. (Hint: The problems below are
simple extensions of the Coursera problems.)
2. (40 points) When using SGD on the following error function and ‘ignoring’ some singular points
that are not differentiable, prove or disprove that err(w) = max(0, −ywT x) results in PLA.
3. (40 points) Write down the derivation steps of Question 17 of Homework 3 on Coursera.
4. (20 points, *) For Questions 19 and 20 of Homework 3 on Coursera, plot a figure that shows Ein(wt)
as a function of t for both the gradient descent version and the stochastic gradient descent version
on the same figure. Describe your findings. Please print out the figure for grading.
5. (20 points, *) For Questions 19 and 20 of Homework 3 on Coursera, plot a figure that shows Eout(wt)
as a function of t for both the gradient descent version and the stochastic gradient descent version
on the same figure. Describe your findings. Please print out the figure for grading.
Bonus: Smart ‘Cheating’
6. (Bonus 20 points) For a regression problem, the root-mean-square-error (RMSE) of a hypothesis h
on a test set {(xn, yn)}
N
n=1 is defined as
RMSE(h) =
vuut
1
N
X
N
n=1
(yn − h(xn))2
.
1 of 2
Machine Learning Foundations (NTU, Fall 2018) instructor: Hsuan-Tien Lin
Please consider a case of knowing all the (test) xn, none of the yn, but allowed to query RMSE(h)
for some h.
For any given set of hypotheses {h1, h2, · · · , hK}. Assume that every RMSE(hk) = ek has been
queried and thus known. Also, assume that RMSE(h0) = e0 is known where h0(x) = 0 for any x.
Let H(x) = PK
k=1 wkhk(x). Derive a closed-form solution from
min
w1,w2,··· ,wK
RMSE (H)
from e0, e1, . . . , eK, h0, h1, . . . , hK, and x1, x2, . . . , xN . (The solution can be used to optimize the
“test” performance that aggregates all hk’s together, which is a common trick used in data mining
competitions to fight on the leaderboard.)
2 of 2

More products