$35
Machine Learning Homework 2 Comp540
The code base hw2.zip for the assignment is an attachment to Assignment 2 on Canvas. You will add your code at the indicated spots in the files there. Place your answers to Problems 1 and 2 (typeset) in a file called writeup.pdf and add it to the zip archive. Upload the entire archive back to Canvas before the due date and time. 1 Gradient and Hessian of NLL(θ) for logistic regression (10 points) • (2 points) Let g(z) = 1 1+e−z . Show that ∂g(z) ∂z = g(z)(1 − g(z)). • (4 points) Using the previous result and the chain rule of calculus, derive the following expression for the gradient of the negative log likelihood function NLL(θ) for logistic regression. ∂ ∂θNLL(θ) = Xm i=1 (hθ(x (i) ) − y (i) )x (i) • (4 points) The Hessian or second derivative of the NLL(θ) can be written as H = XT SX where S = diag(hθ(x (1))(1 − hθ(x (1))), ..., hθ(x (m) )(1 − hθ(x (m) ))) Show that H is positive definite. You may assume that 0 < hθ(x (i) ) < 1, so the elements of S are strictly positive and that X is full rank. 2 Properties of L2 regularized logistic regression (20 points) Consider minimizing the L2 penalized logistic regression cost function: J(θ) = − 1 m Xm i=1 y (i) log(hθ(x (i) )) + (1 − y (i) )log(1 − hθ(x (i) )) + λ 2m X d j=1 θ 2 j for a data set D = {(x (i) , y(i) )|1 ≤ i ≤ m; x (i) ∈ 0 or 0 otherwise. 14 Fitting regularized logistic regression models (L2 and L1) (8 points) For each representation of the features, we will fit L1 and L2 regularized logistic regression models. Your task is to complete the function select lambda crossval in utils.py to select the best regularization parameter λ by 10-fold cross-validation on the training data. This function takes a training set X and y and sweeps a range of λ’s from lambda low to lambda high in steps of lambda step. Default values for these parameters are in logreg spam.ipynb. For each λ, divide the training data into ten equal portions using sklearn.cross validation’s function KFold. Train a regularized sklearn logistic regression model (of the L1 and L2 variety) on nine of those parts and test its accuracy on the left out portion. The accuracy of a model trained with that λ is the average of the ten test errors you obtain. Do this for every λ in the swept range and return the lambda that yields the highest accuracy . logreg spam.ipynb will then build the regularized model with the best lambda for both L1 and L2 regularization you calculate and then determine the training and test set accuracies of the model. You should see test set accuracies between 91% and 94% with the different feature transforms and the two regularization schemes. Comment on the model sparsities with L1 and L2 regularization. Which class of models will you recommend for this data set and why? What to turn in Please zip up all the files in the archive (including files that you did not modify) and submit it as hw2 netid.zip on Canvas before the deadline, where netid is to be replaced with your netid. Include a PDF file in the archive that presents your plots and discussion of results from the programming component of the assignment. Also include typeset solutions to the written problems 1 and 2 of this assignment in your writeup. Only one submission per group of two, please.