Homework 3: Max-Margin and SVM

Your shopping cart is empty.

Homework 3: Max-Margin and SVM
Introduction
This homework assignment will have you work with max-margin methods and SVM classiﬁcation. The aim of the assignment is (1) to further develop your geometrical intuition behind margin-based classiﬁcation and decision boundaries, (2) to explore the properties of kernels and how they provide a diﬀerent form of feature development from basis functions, and ﬁnally (3) to implement a basic Kernel based classiﬁer.
There is a mathematical component and a programming component to this homework. If a question requires you to make any plots, like Problem 3, please include those in the writeup.
Problem 1 (Fitting an SVM by hand, 7pts) For this problem you will solve an SVM without the help of a computer, relying instead on principled rules and properties of these classiﬁers. Consider a dataset with the following 7 data points each with x ∈R : {(xi,yi)}i = {(−3,+1),(−2,+1),(−1,−1),(0,−1),(1,−1),(2,+1),(3,+1)} Consider mapping these points to 2 dimensions using the feature vector φ(x) = (x,x2). The hard margin classiﬁer training problem is:
min w,w0kwk2 2 (1) s.t. yi(wφ(xi) + w0) ≥ 1, ∀i ∈{1,...,n}
The exercise has been broken down into a series of questions, each providing a part of the solution. Make sure to follow the logical structure of the exercise when composing your answer and to justify each step. 1. Plot the training data in R2 and draw the decision boundary of the max margin classifer. 2. What is the value of the margin achieved by the optimal decision boundary?
3. What is a vector that is orthogonal to the decision boundary? 4. Considering discriminant h(φ(x);w,w0) = wφ(x)+w0, give an expression for all possible (w,w0) that deﬁne the optimal decision boundary. Justify your answer.
5. Consider now the training problem (1). Using your answers so far, what particular solution to w will be optimal for this optimization problem?
6. Now solve for the corresponding value of w0, using your general expression from part (4.) for the optimal decision boundary. Write down the discriminant function h(φ(x);w,w0).
7. What are the support vectors of the classiﬁer? Conﬁrm that the solution in part (6.) makes the constraints in (1) binding for support vectors.
Solution
Problem 2 (Composing Kernel Functions, 10pts) A key beneﬁt of SVM training is the ability to use kernel functions K(x,x0) as opposed to explicit basis functions φ(x). Kernels make it possible to implicitly express large or even inﬁnite dimensional basis features. We do this by computing φ(x)φ(x0) directly, without ever computing φ(x) . When training SVMs, we begin by computing the kernel matrix K, over our training data {x1,...,xn}. The kernel matrix, deﬁned as Ki,i0 = K(xi,xi0), expresses the kernel function applied between all pairs of training points.
In class, we saw Mercer’s theorem, which tells us that any function K that yields a positive semideﬁnite kernel matrix forms a valid kernel, i.e. corresponds to a matrix of dot-products under some basis φ. Therefore instead of using an explicit basis, we can build kernel functions directly that fulﬁll this property.
A particularly nice beneﬁt of this theorem is that it allows us to build more expressive kernels by composition. In this problem, you are tasked with using Mercer’s theorem and the deﬁnition of a kernel matrix to prove that the following compositions are valid kernels, assuming K(1) and K(2) are valid kernels. Recall that a positive semi-deﬁnite matrix K requires zKz ≥ 0, ∀ z ∈Rn. 1. K(x,x0) = cK(1)(x,x0) for c 0 2. K(x,x0) = K(1)(x,x0) + K(2)(x,x0) 3. K(x,x0) = f(x)K(1)(x,x0)f(x0) where f is any function from Rm to R 4. K(x,x0) = K(1)(x,x0)K(2)(x,x0) [Hint: Use the property that for any φ(x), K(x,x0) = φ(x)φ(x0) forms a positive semi-deﬁnite kernel matrix. ]
5. (a) The exp function can be written as,
exp(x) = lim i→∞?1 + x +···+ xi i!?. Use this to show that exp(xx0) (here x,x0 ∈R)) can be written as φ(x)φ(x0) for some basis function φ(x). Derive this basis function, and explain why this would be hard to use as a basis in standard logistic regression. (b) Using the previous identities, show that K(x,x0) = exp(K(1)(x,x0)) is a valid kernel.
6. Finally use this analysis and previous identities to prove the validity of the Gaussian kernel: K(x,x0) = exp?−||x−x0||2 2 2σ2 ?
Solution
Problem 3 (Scaling up your SVM solver, 10pts (+opportunity for extra credit)) For this problem you will build a simple SVM classiﬁer for a binary classiﬁcation problem. We have provided you two ﬁles for experimentation: training data.csv and validation val.csv. • First read the paper at http://www.jmlr.org/papers/volume6/bordes05a/bordes05a.pdf and implement the Kernel Perceptron algorithm and the Budget Kernel Perceptron algorithm. Aim to make the optimization as fast as possible. Implement this algorithm in problem3.py.
[Hint: For this problem, eﬃciency will be an issue. Instead of directly implementing this algorithm using numpy matrices, you should utilize Python dictionaries to represent sparse matrices. This will be necessary to have the algorithm run in a reasonable amount of time. ] • Next experiment with the hyperparameters for each of these models. Try seeing if you can identify some patterns by changing β, N (the maximum number of support vectors), or the number of random training samples taken during the Randomized Search procedure (Section 4.3). Note the training time, training and validation accuracy, and number of support vectors for various setups. • Lastly, compare the classiﬁcation to the naive SVM imported from scikit-learn by reporting accuracy on the provided validation data. For extra credit, implement the SMO algorithm and implement the LASVM process and do the same as above.a
We are intentionally leaving this problem open-ended to allow for experimentation, and so we will be looking for your thought process and not a particular graph. Visualizations should be generated using the provided code. You can use the trivial K(x,x0) = xx0 kernel for this problem, though you are welcome to experiment with more interesting kernels too.
In addition, provide answers the following reading questions in one or two sentences for each.
1. In one short sentence, state the main purpose of the paper.
2. Describe each of the parameters in Eq. 1 in the paper
3. State, informally, one guarantee about the Kernel perceptron algorithm described in the paper.
4. What is the main way the budget kernel perceptron algorithm tries to improve on the perceptron algorithm?
5. (if you did the extra credit) In simple words, what is the theoretical guarantee of LASVM algorithm? How does it compare to its practical performance?
aExtra credit only makes a diﬀerence to your grade at the end of the semester if you are on a grade boundary.
Solution
Calibration [1pt]
Approximately how long did this homework take you to complete?

Shopping cart

US$0

Homework 3: Max-Margin and SVM

More products