Starting from:

$30

Assignment Six ECE 4200

Assignment Six
ECE 4200
• Provide credit to any sources other than the course staff that helped you solve the problems.
This includes all students you talked to regarding the problems.
• You can look up definitions/basics online (e.g., wikipedia, stack-exchange, etc).
• The due date is 10/25/2020, 23.59.59 ET.
• Submission rules are the same as previous assignments.
Problem 1. (15 points). SVM’s obtain non-linear decision boundaries by mapping the feature
vectors X¯ ∈ R
d
to a possibly high dimensional space via a function φ : R
d → R
m, and then finding
a linear decision boundary in the new space.
We also saw that to implement SVM, it suffices to know the kernel function K(X¯
i
, X¯
j ) =
φ(X¯
i) · φ(X¯
j ), without even explicitly specifying the function φ.
Recall Mercer’s theorem. K is a kernel function if and only if for any n vectors, X¯
1, . . . ,X¯
n ∈
R
d
, and any real numbers c1, . . . , cn,
Pn
i=1
Pn
j=1 cicjK(X¯
i
, X¯
j ) ≥ 0.
1. Prove the following half of Mercer’s theorem (which we showed in class). If
P
K is a kernel then
n
i=1
Pn
j=1 cicjK(X¯
i
, X¯
j ) ≥ 0.
2. Let d = 1, and x, y ∈ R. Is the function K(x, y) = x + y a kernel?
3. Let d = 1, and x, y ∈ R. Is K(x, y) = xy + 1 a kernel?
4. Suppose d = 2, namely the original features are of the form X¯
i = [X¯ 1
, X¯ 2
]. Show that
K(X, ¯ Y¯ ) = (1 + X¯ · Y¯ )
2
is a kernel function. This is called as quadratic kernel.
(Hint: Find a φ : R
2 → R
m (for some m) such that φ(X¯) · φ(Y¯ ) = (1 + X¯ · Y¯ )
2
).
5. Consider the training examples h[0, 1], 1i,h[1, 2], 1i,h[−1, 2], 1i,h[0, 11], 1i,h[3, 4], −1i,h[−3, 4], −1i,
h[1 − 1], −1i,h[−1, −1], −1i. We have plotted the data points below.
• Is the data linearly classifiable in the original 2-d space? If yes, please come up with
any linear decision boundary that separates the data. If no, please explain why.
• Is the data linearly classifiable in the feature space corresponding to the quadratic kernel.
If yes, please come up with any linear decision boundary that separates the data. If no,
please explain why.
1
−8 −6 −4 −2 2 4 6 8
−5
5
10
15
Problem 2. (10 points). Let f, hi
, 1 ≤ i ≤ n be real-valued functions and let α ∈ R
n
. Let
L(z, α) = f(z) + Pn
i=1
αihi(z). In this problem, we will prove that the following two optimization
problems are equivalent.
min
z
f(z)
s.t. hi(z) ≤ 0, i = 1, . . . , n.
(1) min
z

max
α≥0
L(z, α)

(2)
Let (z

, α∗
) be the solution of (2) and let z

p be the solution of (1). Prove that:
L(z

, α∗
) = f(z

p
)
Hint: Use the fact that for any z, α ≥ 0, L(z

, α∗
) ≥ L(z

, α) and L(z

, α∗
) ≤ L(z, αz), where
αz = arg maxα≥0 L(z, α).
You may follow the following steps but it is not required as long as your proof is correct.
1. Prove that L(z

, α∗
) ≤ f(z

p
)
2. Prove that L(z

, α∗
) ≥ f(z

p
)
Problem 3. (15 points). In this problem, we derive the dual formulation of the soft-margin
SVM problem with ¯ξ = ξ1, . . . , ξn.
min
w, ¯ ξ¯
1
2
kw¯k
2
2 + C ·
Xn
i=1
ξi
such that
1 − ξi − yi(X¯
i
· w¯ − t) ≤ 0, i = 1, . . . , n,
− ξi ≤ 0, i = 1, . . . , n.
(3)
2
Now we can define 2n Lagrangian variables α = α1, . . . , αn, and β = β1, . . . , βn corresponding to
these equations and obtain the following Lagrangian.
L( ¯w, t, ¯ξ, α, β) = 1
2
kw¯k
2
2 + C ·
Xn
i=1
ξi +
Xn
i=1
αi(1 − ξi − yi(X¯
i
· w¯ − t)) −
Xn
i=1
βiξi (4)
The original problem is now equivalent to
min
w,t, ¯ ξ¯

max
α≥0,β≥0
L( ¯w, t, ¯ξ, α, β)

.
For this objective, minmax theorem says that the min max problem is equivalent to the max min
problem below. You do not have to prove this, but are encouraged to do so (using the argument
given in the discussion earlier). This gives the following problem.
max
α≥0,β≥0

min
w,t, ¯ ξ¯
L( ¯w, t, ¯ξ, α, β)

.
1. For a fixed α, β take the gradient of the the Lagrangian with respect to ¯w, and express ¯w in
terms of the other variables.
2. Differentiate the Lagrangian with respect to t, and equate to zero to obtain another equation.
3. Differentiate the Lagrangian with respect to ξi and show that αi + βi = C at the optimum.
This shows that αi ≤ C, since βi ≥ 0.
4. The expressions from 1, 2, 3 define one of the KKT conditions. Show that under these
conditions,
min
w,t, ¯ ξ¯
L( ¯w, t, ¯ξ, α, β) = Xn
i=1
αi −
1
2
Xn
i=1
Xn
j=1
αiαjyiyj (X¯
i
· X¯
j ).
Basically, use the expressions from 1, 2, and 3 to ”cancel out” ¯w, t, ¯ξ in the Lagrangian.
5. Combine the results above to argue that the following optimization is equivalent to the soft
margin SVM we started with.
max
α≥0
Xn
i=1
αi −
1
2
Xn
i=1
Xn
j=1
αiαjyiyj (X¯
i
· X¯
j )
such that
0 ≤ αi ≤ C, i = 1, . . . , n.
(5)
Problem 4 (25 points) SVM Classification. Please refer to the Jupyter Notebook in the assignment, and complete the coding part in it! You can use sklearn SVM package: https://scikitlearn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC
3

More products