Starting from:

$35

Deep learning and Neural Networks

CS 446 / ECE 449 — Homework 2

Version 1.0
Instructions.
• Homework is due Tuesday, February 22, at noon CST; no late homework accepted.
• Everyone must submit individually on Gradescope under hw2 and hw2code.
• The “written” submission at hw2 must be typed, and submitted in any format Gradescope accepts
(to be safe, submit a PDF). You may use LATEX, Markdown, Google Docs, MS Word, whatever you like;
but it must be typed!
• When submitting at hw2, Gradescope will ask you to select pages for each problem; please do this
precisely!
• Please make sure your NetID is clear and large on the first page of the homework.
• Your solution must be written in your own words. Please see the course webpage for full academic
integrity information. Briefly, you may have high-level discussions with at most 3 classmates, whose
NetIDs you should place on the first page of your solutions, and you should cite any external reference
you use; despite all this, your solution must be written in your own words.
• We reserve the right to reduce the auto-graded score for hw2code if we detect funny business (e.g., your
solution lacks any algorithm and hard-codes answers you obtained from someone else, or simply via
trial-and-error with the autograder).
• Coding problems come with suggested “library routines”; we include these to reduce your time fishing
around APIs, but you are free to use other APIs.
• When submitting to hw2code, only upload the two python files hw2.py and hw2 utils.py. Don’t
upload a zip file or additional files.
Version history.
1.0. Initial version.
1
1. SVM with Biases.
This problem is about SVMs over R
d with linearly separable data (i.e., the hard margin SVM).
Our formulation of SVM required separators to pass through the origin, which does not provide a
geometrically pleasing notion of maximum margin direction.
A first fix is provided by lecture 4: by appending a 1 to the inputs, we obtain the convex program
min
u
1
2
kuk
2
subject to u ∈ R
d+1
yi
 xi
1
T
u ≥ 1 ∀i,
and let u¯ denote the optimal solution to this program.
A second standard fix is to incorporate the bias directly into the optimization problem:
min
v,b
1
2
kvk
2
subject to v ∈ R
d
, b ∈ R
yi(v
Txi + b) ≥ 1 ∀i,
and let (v¯,
¯b) ∈ R
d × R denote an optimal solution to this program. This second version is standard, but
we do not use it in lecture for various reasons.
(a) In lecture, we stated that the first formulation is a convex program (formally defined in lecture 5).
Show that the second formulation is also a convex program.
(b) Suppose there is only one datapoint: x1 = e1, the first standard basis vector, with label y1 = +1.
The first formulation will have a unique solution u¯, as discussed in lecture. Show that the second
formulation does not have a unique solution.
(c) Let’s add another datapoint: x2 = −ae1 for some a ≥ 3, with label y2 = −1. Now that we have two
data points, both of the convex programs now have two constraints. Write out the explicit constraints
to the first convex program.
(d) Using these two constraints, show that the first coordinate u¯1 of the optimal solution u¯ satisfies
u¯1 ≥
2
a+1 .
(e) Using parts (c) and (d), find optimal solutions u¯ and (v¯,
¯b), and prove they are in fact optimal.
Hint: If you are stuck, first try the case d = 1. Then study what happens for d = 2, d = 3, . . .
Hint: (v¯,
¯b) will be unique.
(f) Now we will consider the behavior of u¯ and v¯ as a increases; to this end, write u¯a and v¯a, and consider
a → ∞. Determine and formally prove the limiting behavior of lima→∞
1
2
ku¯ak
2 and lima→∞
1
2
kv¯ak
2
.
Hint: The two limits will not be equal.
(g) Between the two versions of SVM with bias, which do you prefer? Any answer which contains at
least one complete sentence will receive full credit.
Remark: Initially it may have seemed that both optimization problems have the same solutions; the
purpose of this problem was to highlight that small differences in machine learning methods can lead
to observably different performance.
Solution.
2

More products