$30
Machine Learning Foundations
Homework #2
Unless granted by the instructor in advance, you must turn in a printed/written copy of your solutions
(without the source code) for all problems.
For problems marked with (*), please follow the guidelines on the course website and upload your source
code to designated places. You are encouraged to (but not required to) include a README to help the
TAs check your source code. Any programming language/platform is allowed.
Discussions on course materials and homework solutions are encouraged. But you should write the final
solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but
not copied from.
Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework
solutions and/or source codes to your classmates at any time.
You should write your solutions in English or Chinese with the common math notations introduced in
class or in the problems. We do not accept solutions written in any other languages.
This homework set comes with 200 points and 20 bonus points. In general, every homework set would come with a full credit of 200 points, with some possible bonus points.
1. (80 points) Go register for the Coursera version of the first part of the class ( https://www.
coursera.org/learn/ntumlone-mathematicalfoundations/ ) and solve its homework 2. The
registration should be totally free. Then, record the highest score that you get within up to 3
trials. Please print out a snapshot of your score as an evidence. (Hint: The problems below are
simple extensions of the Coursera problems.)
2. (20 points) Consider the “negative thick line in R
2” hypothesis set, which includes hypothesis that
returns +1 when |w0 + w1x1 + w2x2| ≤ θ and −1 elsewhere. Show that the VC-Dimension of the
hypothesis set is no less than 4. Please provide proof of your answer.
3. (20 points) Consider the “triangle waves” hypothesis set on R, , which is given by
H = {hα | hα(x) = sign(|αx mod 4 − 2| − 1), α ∈ R}
Here (x mod 4) is a number x − 4k for some integer k such that x − 4k ∈ [0, 4). For instance,
(11.26 mod 4) is 3.26, and (−11.26 mod 4) is 0.74. What the VC-Dimension of such an H? Please
prove your answer.
4. (20 points) For any two hypothesis sets H1 and H2 that come with non-empty intersection, prove
that dvc(H1 ∩ H2) ≤ dvc(H2). (Hint: This may partially help you solve Questions 14 and 15 on
Coursera.)
5. (20 points) Consider H1 as the positive-ray hypothesis set (as discussed in class), and H2 as the
negative-ray hypothesis set (which contains the negation of each hypothesis in H1). We showed
that mH1
(N) = N + 1 in class. Write down mH1∪H2
(N) and use that to calculate dvc(H1 ∪ H2).
(Hint: This may partially help you solve Question 15 on Coursera.)
1 of 2
Machine Learning Foundations (NTU, Fall 2018) instructor: Hsuan-Tien Lin
6. (20 points) For Problems 7–8, you will play with the decision stump algorithm. In class, we
taught about the learning model of “positive and negative rays” (which is simply one-dimensional
perceptron) for one-dimensional data. The model contains hypotheses of the form:
hs,θ(x) = s · sign(x − θ).
The model is frequently named the “decision stump” model and is one of the simplest learning
models. As shown in class, for one-dimensional data, the VC dimension of the decision stump
model is 2.
In fact, the decision stump model is one of the few models that we could easily minimize Ein
efficiently by enumerating all possible thresholds. In particular, for N examples, there are at most
2N dichotomies (see page 22 of lecture 5 slides), and thus at most 2N different Ein values. We
can then easily choose the dichotomy that leads to the lowest Ein, where ties an be broken by
randomly choosing among the lowest Ein ones. The chosen dichotomy stands for a combination
of some ”spot” (range of θ) and s, and commonly the median of the range is chosen as the θ that
realizes the dichotomy.
In the next problem, you are asked to implement such and algorithm and run your program on an
artificial data set. We shall start by generating a one-dimensional data by the procedure below:
(a) Generate x by a uniform distribution in [−1, 1].
(b) Generate y by f(x) = ˜s(x) + noise where ˜s(x) = sign(x) and the noise flips the result with
20% probability.
For any decision stump hs,θ with θ ∈ [−1, 1], express Eout(hs,θ) as a function of θ and s. Write
down your derivation steps.
7. (20 points, *) Generate a data set of size 20 by the procedure above and run the one-dimensional
decision stump algorithm on the data set. Record Ein and compute Eout with the formula above.
Repeat the experiment 1000 times and plot a histogram of Ein − Eout. Describe your findings.
Bonus: More about the Bounding Function
8. (Bonus 20 points) In class, we have shown that
B(N, k) ≤
k
X−1
i=0
N
i
Show that in fact the equality holds.
2 of 2