Starting from:

$30

Machine Learning  Homework #3

Machine Learning 
Homework #3

1. You need to submit a report in hard-copy before lecture and your code to BeachBoard.
2. Hard-copy is due in class before lecture and electronic copy is due 11:59PM on BeachBoard
on the due date.
3. Unlimited number of submissions are allowed on BeachBoard and the latest one will be graded.
1. (10 points) Exercise 3.6(page 92) in LFD.
2. (10 points) Exercise 3.7 (page 92) in LFD.
3. (20 points) Recall the objective function for linear regression can be expressed as
E(w) = 1
N
kXw − yk
2
,
as in Equation (3.3) of LFD. Minimizing this function with respect to w leads to the optimal
w as (XT X)
−1XT y. This solution holds only when XT X is nonsingular. To overcome this
problem, the following objective function is commonly minimized instead:
E2(w) = kXw − yk
2 + λkwk
2
,
where λ 0 is a user-specified parameter. Please do the following:
(a) (10 points) Derive the optimal w that minimize E2(w).
(b) (10 points) Explain how this new objective function can overcome the singularity problem
of XT X.
4. (35 points) In logistic regression, the objective function can be written as
E(w) = 1
N
X
N
n=1
ln ?
1 + e
−ynwT xn
?
.
Please
(a) (10 points) Compute the first-order derivative ∇E(w). You will need to provide the
intermediate steps of derivation.
(b) (10 points) Once the optimal w is obtain, it will be used to make predictions as follows:
Predicted class of x =
(
1 if θ(w
T x) ≥ 0.5
−1 if θ(w
T x) < 0.5
where the function θ(z) = 1
1+e−z looks like
1
Explain why the decision boundary of logistic regression is still linear, though the linear signal w
T x is passed through a nonlinear function θ to compute the outcome of
prediction.
(c) (5 points) Is the decision boundary still linear if the prediction rule is changed to the
following? Justify briefly.
Predicted class of x =
(
1 if θ(w
T x) ≥ 0.9
−1 if θ(w
T x) < 0.9
(d) (10 points) In light of your answers to the above two questions, what is the essential
property of logistic regression that results in the linear decision boundary?
5. (35 points) Logistic Regression for Handwritten Digits Recognition: Implement logistic regression for classification using gradient descent to find the best separator. The
handwritten digits files are in the “data” folder: train.txt and test.txt. The starting code is
in the “code” folder. In the data file, each row is a data example. The first entry is the digit
label (“1” or “5”), and the next 256 are grayscale values between -1 and 1. The 256 pixels
correspond to a 16 × 16 image. You are expected to implement your solution based on the
given codes. The only file you need to modify is the “solution.py” file. You can test your
solution by running “main.py” file. Note that code is provided to compute a two-dimensional
feature (symmetry and average intensity) from each digit image; that is, each digit image is
represented by a two-dimensional vector before being augmented with a “1” to form a threedimensional vector as discussed in class. These features along with the corresponding labels
should serve as inputs to your logistic regresion algorithm.
(a) (15 points) Complete the logistic regression() function for classifying digits number
“1” and “5”.
(b) (5 points) Complete the accuracy() function for measuring the classification accuracy
on your training and test data.
(c) (5 points) Complete the thirdorder() function to transfer the features into 3rd order
polynomial Z-space.
(d) (10 points) Run “main.py” to see the classify results. As your final deliverable to a
customer, would you use the linear model with or without the 3rd order polynomial
transform? Briefly explain your reasoning.
Deliverable: You should submit (1) a hard-copy report (along with your write-up for other
questions) that summarizes your results before the lecture and (2) the “solution.py” file to
the BeachBoard.
Note: Please read the “Readme.txt” file carefully before you start this assignment. Please
do NOT change anything in the “main.py” and “helper.py” files when you program.
2

More products