$30
CSE512 – Machine Learning
Homework 3, Version 1
This homework contains 2 questions. The last question requires programming. The maximum number
of points is 100 plus 10 bonus points.
1 Question 1 – Naive Bayes and Logistic Regression (40 points)
1.1 Naive Bayes with both continuous and boolean variables (20 points)
Consider learning a function X → Y where Y is boolean, where X = (X1, X2), and where X1 is a boolean
variable and X2 a continuous variable. State the parameters that must be estimated to define a Naive Bayes
classifier in this case. Give the formula for computing P(Y |X), in terms of these parameters and the feature
values X1 and X2.
1.2 Naive Bayes and Logistic Regression with Boolean variables (20 points)
In class, we showed that when Y is Boolean and X = (X1, · · · , Xd) is a vector of continuous variables, the
the assumptions of the Gaussian Naive Bayes classifier imply that P(Y |X) is given by the logistic function
with appropriate parameters θ. In particular:
P(Y = 1|X) = 1
1 + exp(−(
Pd
i=1 θiXi + θd+1))
(1)
Consider instead the case where Y is Boolean and X = (X1, · · · , Xd) is a vector of Boolean variables.
Prove for this case also that P(Y |X) follows this same form (and hence that Logistic Regression is also the
discriminative counterpart to a Naive Bayes generative classifier over Boolean features).
2 Question 2 – Implementation of Logistic Regression (60 points + 10 bonus)
In this Question, you will implement Logistic Regression using Stochastic gradient descent optimization.
Suppose the training data is {(X1
, Y 1
), · · · ,(Xn
, Y n
)}, where Xi
is a column vector of d dimensions and
Y
i
is the binary target label. For a column vector X, let X¯ denotes [X; 1], the vector obtained by appending
1 to the end of X. Logistic regression assumes the following probability function:
P(Y = 1|X; θ) = 1
1 + exp(−θ
T X¯)
(2)
Logistic regression minimizes the negative conditional log likelihood:
−
Xn
i=1
log(P(Y
i
|X¯i
; θ)) (3)
An equivalent minimization problem is to minimize the average conditional log likelihood:
L(θ) = −
1
n
Xn
i=1
log(P(Y
i
|X¯i
; θ)) (4)
1
To minimize this loss function, we can use gradient descent:
θ = θ − η
∂L
∂θ
(5)
where ∂L
∂θ
= −
1
n
Xn
i=1
∂ log(P(Y
i
|X¯i
; θ))
∂θ
(6)
This gradient is computed by enumerating over all training data. It turns out that this gradient can be approximated using a batch of training data. Suppose B is a subset of {1, 2, · · · , n}
∂L
∂θ
≈ −
1
card(B)
X
i∈B
∂ log(P(Y
i
|X¯i
; θ))
∂θ
(7)
This leads to the following stochastic gradient descent algorithm:
Algorithm 1 Stochastic gradient descent for Logistic Regression
1: Inputs: {(Xi
, Yi)}
n
i=1 (for data), m (for batch size), η0, η1 (for step size), max epoch, δ (stopping
criteria)
2: for epoch = 1, 2, ..., max epoch do
3: η ← η0/(η1 + epoch)
4: (i1, ..., in) = permute(1, ..., n)
5: Divide (i1, ..., in) into batches of size m or m + 1
6: for each batch B do
7: Update θ using Eqs. (5) and (7)
8: Break if L(θ
new) > (1 − δ)L(θ
old) // not much progress, terminate
9: Outputs: θ.
2.1 Derivation (10 points)
Prove that:
∂ log(P(Y
i
|X¯i
; θ))
∂θ
= (Y
i − P(Y = 1|X¯i
; θ))X¯i
(8)
2.2 Hand Classification
In this question of the homework, you will work with image data. Your task is to classify between hand/nothand images. The data has already been split into train, validation, and test sets. For the training and
validation sets, the first half of the images are hands and second half are not-hands.
Raw image intensity values are not robust features for classification. In this question, we will use Histogram of Oriented Gradient (HOG) as image features. HOG uses the gradient information instead of intensities, and this is more robust to changes in color and illumination conditions. See [1] for more information
about HOG.
We will implement this question in Python. To use HOG, you will need to install skimage. This is
an excellent image processing library. In fact, you should not call HoG directly. Use the supplied helper
function load dataset() instead; it will call HoG when loading the dataset.
2
The provided homework directory has the following structure:
root
– hw-3.pdf
– logistic regression.ipynb
– requirements.txt
– data (obtain from Kaggle competition page)
—- train (8170 images, 0-4084.png: hand, 4085-8169.png: not-hand)
—- val (2724 images, 0-1361.png: hand, 1362-2723.png: not-hand)
—- test (5542 images)
2.3 Implement Logistic Regression with SGD (50 points + 10 bonus)
We have provided the skeleton code in the form of a Jupyter Notebook. Your task is to complete the implementations in it.
1. (10 points) Run your implementation on the provided training data with max epoch = 1000, m =
16, η0 = 0.1, η1 = 1, δ = 0.0001.
(a) Report the number of epochs that your algorithm takes before exiting.
(b) Plot the curve showing L(θ) as a function of epoch.
(c) What is the final value of L(θ) after the optimization?
2. (10 points) Keep m = 16, δ = 0.0001, experiment with different values of η0 and η1. Can you find a
pair of parameters (η0, η1) that leads to faster convergence?
(a) Report the values of (η0, η1). How many epochs does it take? What is the final value of L(θ)?
(b) Plot the curve showing L(θ) as a function of epoch.
3. (10 points) Evaluate the performance on validation data
(a) Plot L(θ) as a function of epoch. On the same plot, show two curves, one for training and one
for validation data.
(b) Plot the accuracy as a function of epoch. On the same plot, show two curves, one for training
and one for validation data.
4. (10 points) With the learned classifier: (You can plot these with sklearn in python, simply install it by:
pip install scikit-learn)
(a) Plot the ROC curve on validation data. Report the area under the curve.
(b) Plot the Precision-Recall curve on validation data. Report the average precision.
2.4 Submit the result in Kaggle (10 points + 10 bonus)
1. (10 points) Run your classifier on the test data and submit the result file on Kaggle (https://
www.kaggle.com/t/dbe3655b7c01457d8a92348881887dd1). Report the best accuracy
you obtain on the test data, as returned by the Kaggle website.
2. (10 bonus points) Optimize your classifier to compete for the leading positions as the top 3 positions
will receive the bonus points! You can use validation data to tune your classifier. You can use validation
data together with the training data to train your classifier, once you’ve identified the best combination
of hyper-parameters, to generate test predictions for Kaggle submission. You can be creative with the
construction of feature vectors. But you cannot use deep learning CNN features of the images.
NOTE: Make sure the names displayed on the leader-board (your Kaggle user-names) can be used by
the graders to uniquely identify you (SBU ID is recommended).
3
3 What to submit?
3.1 Blackboard submission
You will need to submit both your code and your answers to questions on Blackboard. Put the answer file and
your code in a folder named: SUBID FirstName LastName (e.g., 10947XXXX lionel messi). Zip this folder
and submit the zip file on Blackboard. Your submission must be a zip file, i.e, SUBID FirstName LastName.zip.
The answer file should be named: answers.pdf. The first page of the answers.pdf should be the filled cover
page at the end of this homework. The remaining of the answer file should contain:
1. Answers to Question 1.1 and 1.2
2. Answers to Question 2.1 and the requested plots and numbers for 2.3 and 2.4.
3. For 2.3, the completed .ipynb file with the numbers and plots as asked for in the several sub-questions.
If any explanation asked, include that in the notebook in the form of a Markdown cell.
4. For 2.4, write the accuracy from kaggle website.
3.2 Prediction submission
For Question 2.4, you must submit a .csv file to get the accuracy through Kaggle. A submission file should
contain two columns: Id and Class. The file should contain a header and have the following format:
Id, Class
1, 1
2, 10
... ...
You can only make 3 submissions per day. In the end, your top-2 entries (you can manually choose
them) will be evaluated on a held-out division of the test set. The final rankings will be the ones as obtained
through this private leader-board.
4 Cheating warnings
Don’t cheat. You must do the homework yourself, otherwise you won’t learn. You must use your SBU ID
as your file name for the competition. Do not fake your Stony Brook ID to bypass the submission limitation
per 24 hours. Doing so will be considered cheating.
5 Reference
[1] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 2005.
Cover page for answers.pdf
CSE512 Fall 2018 - Machine Learning - Homework 3
Your Name:
Solar ID:
NetID email address:
Names of people whom you discussed the homework with: