Starting from:

$30

Machine Learning Homework 3

Machine Learning Homework 3
Homework must be submitted electronically on Canvas. Make sure to explain you reasoning or show your
derivations. Except for answers that are especially straightforward, you will lose points for unjustified
answers, even if they are correct.
General Instructions
You are allowed to work with at most one other student on the homework. With your partner, you will
submit only one copy, and you will share the grade that your submission receives. You should set up your
partnership on Canvas as a two-person group.
Submit your homework electronically on Canvas. We recommend using LaTeX, especially for the written problems. But you are welcome to use anything as long as it is neat and readable.
For the programming portion, you should only need to modify the Python files. You may modify the
iPython notebook, but you will not be submitting them, so your code must work with our provided iPython
notebook.
Relatedly, cite all outside sources of information and ideas.
Written Problems
1. Machine learning methods can be viewed as function estimators. Consider the logical functions AND,
OR, and XOR. Using a signed representation for Boolean variables, where input and output variables
are in {+1, −1}, these functions are defined as
AND(x1, x2) = (
+1 if x1 = +1 ∧ x2 = +1
−1 otherwise
(1)
OR(x1, x2) =



+1 if x1 = +1
+1 if x2 = +1
−1 otherwise
(2)
XOR(x1, x2) =



+1 if x1 = +1 ∧ x2 = −1
+1 if x2 = +1 ∧ x1 = −1
−1 otherwise
(3)
(a) (6 points) Which of these three logical functions can be expressed as a linear classifier of the form
f(x; w) = sign(w1x1 + w2x2 + b), (4)
and show weights w1, w2, and bias values b that mimic these logical functions. Which of these
functions cannot be expressed as a linear classifier, and why not?
(b) (6 points) For any logical functions that cannot be expressed as a linear classifier, show how and
with what weights it can be expressed as a two-layered perceptron of the form
f(x; w) = sign(wout>h + b
out) (5)
h = [h1, h2]
> (6)
h1 = sign(w(1)>x + b
(1)) (7)
h2 = sign(w(2)>x + b
(2)) (8)
For this problem, you must specify three weight vectors (wout
, w(1)
, w(2)) and three biases
(b
out, b(1), b(2)).
1
Programming Assignment
For this programming assignment, we have provided a lot of starter code. Your tasks will be to complete
the code in a few specific places, which will require you to read and understand most of the provided code,
but will only require you to write a small amount of code yourself.
In syntheticData.mat, there are 10 synthetic, two-dimensional datasets. All but the first dataset are
not linearly classifiable. We provided the main experiment notebook run synthetic experiment.ipynb
for you to use. For each model, the notebook uses cross-validation on the training data to choose learning
parameters, trains on the full training set using those parameters, and evaluates the accuracy on the test set.
We have also provided a unit test class in test mlp and svm.py, which contains unit tests for each type
of learning model. These unit tests may be easier to use for debugging in an IDE like PyCharm than
the iPython notebook. A successful implementation should pass all unit tests and run through the entire
iPython notebook without issues. You can run the unit tests from a *nix command line with the command
python -m unittest -v test_mlp_and_svm
or you can use an IDE’s unit test interface.
Before starting all the tasks, examine the entire codebase. Follow the code from the iPython notebook to
see which methods it calls. Make sure you understand what all of the code does.
1. (6 points) Finish the back-propagation operation in the mlp objective function in mlp.py.
You should implement the matrix-form of back-propagation that we covered in class, which we replicate here. First, forward propagation computes the predicted values. We refer to the neural activation
values as h vectors. For notational convenience, for input x, let h0 = x. Then forward propagation
computes hi+1 ← s (Wihi) for i from 0 to the depth of the MLP (num layers − 1). If num layers = m,
the output is hm. Forward propagation also computes the derivatives of the squashing function and
stores them in squash derivatives, i.e.,
squash derivatives[i] := s
0
(Wihi).
Back-propagation should start by computing the “errors” δ—i.e., the gradients of the loss with respect
to each layer’s pre-squashed inputs—backwards from the output layer (δm) back to the input layer
(δ1). Use the chain-rule recursion, which starts with the last layer δm = `
0
(f(x,W)), and for all other
layers,
δi =
W>
i+1δi+1
◦ s
0
(Wihi) (9)
where the ◦ operator is the element-wise product (the default product for numpy ndarrays).
Finally, now that the errors have been computed, compute the gradient for each layer with the formula
∇Wi
` = δih
>
i
(10)
You should implement these two equations in the marked TODO lines in mlp.py.
2. (6 points) Finish the support-vector machine setup in kernel svm train and kernel svm predict in
kernelsvm.py.
The dual quadratic program for kernel SVM is
min
α
1
2
α
>(K ◦ yy>)α −
Xn
i=1
αi
s.t. Xn
i=1
αiyi = 0, αi ∈ [0, C]
(11)
where K is the Gram matrix, y is the vector of labels where yi
is the label of the ith point, C is the
regularization parameter, and the free vector α is the dual variables. For the linear kernel, all of the

input quantities have been computed for you. Your job is to put them into the correct form to solve
the quadratic program in the TODO in kernel svm train. You will be using our custom wrapper for
the open-source quadprog solver, which you will need to install to run
To install quadprog, pip install quadprog will sometimes work depending on your system, or
you can clone the repository at https://github.com/rmcgibbo/quadprog and use python setup.py
install to install it directly from source. Also, conda install -c omnia quadprog is another way
to install it.
Make sure to examine the code after the quadratic program completes. This code saves the information necessary for prediction with the returned model object, including the nonzero support vectors
and the bias b. Next, you should finish the linear SVM by implementing the predictor in the TODO in
kernel svm predict. The formula for prediction f(x) is
f(x) = sign Xn
i=1
αiyiK(xi
, x) + b
!
. (12)
As usual, you should be able to manipulate this equation to compute the predictions for a batch of
data without loops.
Once you complete this second TODO, the linear SVM should pass its tests, and the part of the iPython
notebook that tests the linear kernel should run (and perform poorly because the data is not linearly
separable). You may see some of the quadratic programs exit with warnings. These warnings are
okay; they result from the problem becoming poorly scaled and should not happen as often with the
non-linear kernels in the next problem.
3. (6 points) Finish the two kernel functions polynomial kernel and rbf kernel in kernelsvm.py.
This final task requires approximately two lines of code. You should return the Gram matrix for each
of these famous kernel functions. The formula for the polynomial kernel of order k is
K(xi
, xj ) = (x
>
i xj + 1)k
. (13)
And the formula for the Gaussian radial-basis function (RBF) kernel is
K(xi
, xj ) = exp 

1

2
(xi − xj )
>(xi − xj )

. (14)
Both formulas must be computed without loops for full credit, i.e., your code should directly compute
these from the inputs row data and col data without slicing to compute over individual columns of
these. The polynomial kernel is fairly straightforward to convert into matrix operations. For the RBF
kernel, one hint is that it helps to expand the squared distance
(xi − xj )
>(xi − xj )
into
x
>
i xi + x
>
j xj − 2x
>
i xj .
This expanded form is easier to reason about via matrices and vectors. Also, see the class slides for
some matrix formulas for these kernel computations.
3

More products