$30
MACHINE LEARNING COMS 4771, HOMEWORK 1
Submit your work via courseworks.columbia.edu.
Please submit separate files for a) write-up, b) Matlab source files and c) figures (if you choose to
include them separately from the writeup). Do not include any other files. Your write-up should
be in ASCII plain text format (.txt) or Postscript (.ps) or the Adobe Portable Document Format
(.pdf). Please do not submit Microsoft Office documents, LaTeX source code, or something more
exotic since we will not be able to read it. LaTeX is preferred and highly recommended, but it is not
mandatory. You can use any document editing software you wish, as long as the final product is in
.ps or .pdf. Even if you do not use LaTeX to prepare your document, you can use LaTeX notation
to mark up complicated mathematical expressions, for example, in comments in your Matlab code.
See the Tutorials page for more information on LaTeX. All your code should be written in Matlab.
Please submit all your source files, each function in a separate file. Clearly denote what each function
does, which problem it belongs to, and what the inputs and the outputs are. Do not resubmit code or
data provided to you. Do not submit code written by others. Identical submissions will be detected
and both parties will get zero credit. In general, shorter code is better. Sample code is available
on the Tutorials web page. Datasets are available from the Handouts web page. You may include
figures directly in your write-up or include them separately as .jpg, .gif, .ps or .eps files, and refer
to them by filename.
1 Problem 1 (10 points)
Cross-validation for Polynomial Fitting: Consider the problem of fitting a polynomial function.
Assume we wish to find a one dimensional function that takes a scalar input and outputs a scalar
f : R → R. The function has the form
f(x; θ) = θ0 + θ1x + θ2x
2 + . . . + θdx
d
where d is the degree of the polynomial. Develop code that finds the θ which minimizes the risk
Remp(θ) = 1
N
X
N
i=1
1
2
(yi − f(x; θ))2
on a data-set. To help you get started, download the Matlab code in “polyreg.m” (on the tutorial
web page) to do polynomial curve fitting. Use your code on the dataset “problem1.mat”. This should
include a matrix x, corresponding to the scalar features {x1, . . . , xN }, and a matrix y, corresponding
to the scalar labels {y1, . . . , yN }. Fit a polynomial model to this data for various choices for d, the
degree of the polynomial.
Which value(s) of d seems somewhat more reasonable? Please justify your answer using some
empirical measure.
It is easy to overfit the data when using polynomial regression. As a result, use cross-validation
by randomly splitting the data-set into two halves to select the complexity of the model (in this
case, the degree of the polynomial). Include a plot showing the training and testing risk across
various choices of d, and plot your f(x; θ) overlaid on the data for the best choice of d according to
cross-validation.
2 Problem 2 (10 points)
Regularized risk minimization: Modify the Matlab code for “polyreg.m” such that it learns a multivariate regression function f : R
100 → R, where the basis functions are of the form
f(x; θ) = X
k
i=1
θixi
The data-set is available in “problem2.mat”. As before, the x variable contains {x1, . . . , xN } and the
y variable contains their scalar labels {y1, . . . , yN }.
Use an l2 loss function to penalize the complexity of the model, e.g. minimize the risk
Rreg(θ) = 1
N
X
N
i=1
1
2
(yi − f(x; θ))2 +
λ
2N
kθk
2
Use two-fold cross validation (as in Problem 1) to find the best value for λ. Include a plot showing
training and testing risk across various choices of λ. A reasonable range for this data set would be
from λ = 0 to λ = 1000. Also, mark the λ which minimizes the testing error on the data set.
What do you notice about the training and testing error?
3 Problem 3 (10 points)
Logistic Squashing Function. The logistic squashing function is given by g(z) = 1/(1 + exp(−z)).
Show that it satisfies the property g(−z) = 1−g(z). Also show that its inverse is given by g
−1
(y) =
ln(y/(1 − y).
4 Problem 4 (20 points)
Logistic Regression: Implement a linear logistic regression algorithm for binary classification in
Matlab using gradient descent. Your code should accept a dataset {(x1, y1), . . . ,(xN , yN )} where
xi ∈ R
d and yi ∈ {0, 1} and find a parameter vector θ ∈ R
d
for the classification function
f(x; θ) =
1 + exp(−θ
x)
?−1
which minimizes the empirical risk with logistic loss
Remp(θ) = 1
N
X
N
i=1
(yi − 1) log(1 − f(xi
; θ)) − yi
log(f(xi
; θ)).
Since you are using gradient descent, you will have to specify the step size η and the tolerance ?.
Pick reasonable values for η and ? to then use your code to learn a classification function for the
dataset in “dataset4.mat”. Type “load dataset4” and you will have the variables X (input vectors)
and Y (binary labels) in your Matlab environment which contain the dataset.
Show any derivations you need to make for this algorithm
Use the whole data set as training. Show with figures the resulting linear decision boundary on the
2D X data. Show the binary classification error and the empirical risk you obtained throughout
the run from random initialization until convergence. Note the number of iterations needed for your
choice of η and ?.