$30
Machine Learning Foundations
Homework #1
.
Unless granted by the instructor in advance, you must turn in a printed/written copy of your solutions
(without the source code) for all problems.
For problems marked with (*), please follow the guidelines on the course website and upload your source
code to designated places. You are encouraged to (but not required to) include a README to help the
TAs check your source code. Any programming language/platform is allowed.
Discussions on course materials and homework solutions are encouraged. But you should write the final
solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but
not copied from.
Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework
solutions and/or source codes to your classmates at any time.
You should write your solutions in English or Chinese with the common math notations introduced in
class or in the problems. We do not accept solutions written in any other languages.
This homework set comes with 200 points and 20 bonus points. In general, every homework set would come with a full credit of 200 points, with some possible bonus points.
1. (80 points) Go register for the Coursera version of the first part of the class ( https://www.
coursera.org/teach/ntumlone-mathematicalfoundations/ ) and solve its homework 1. The
registration should be totally free. Then, record the highest score that you get within up to 3
trials. Please print out a snapshot of your score as an evidence. (Hint: The problems below are
simple extensions of the Coursera problems.)
Problems 2-3 are about Off-Training-Set error.
Let X = {x1, x2, . . . , xN , xN+1, . . . , xN+L} and Y = {−1, +1} (binary classification). Here the set of
training examples is D =
n
(xn, yn)
oN
n=1
, where yn ∈ Y, and the set of test inputs is n
xN+`
oL
`=1
. The
Off-Training-Set error (OT S) with respect to an underlying target f and a hypothesis g is
EOT S(g, f) = 1
L
X
L
`=1
Jg(xN+`) 6= f(xN+`)K .
2. (20 points) Consider f(x) = +1 for all x and
g(x) =
+1, for x = xk and k is odd and 1 ≤ k ≤ N + L
−1, otherwise .
What is the value of EOT S(g, f)? Please provide proof of your answer.
3. (20 points) A determistic algorithm A is defined as a procedure that takes D as an input, and
outputs a hypothesis g. For any two deterministic algorithms A1 and A2, if all those f that can
“generate” D in a noiseless setting are equally likely in probability, please prove or disprove that
Ef
n
EOT S
A1(D), fo
= Ef
n
EOT S
A2(D), fo
.
1 of