Starting from:

$30

Homework 3: Nonlinear Prediction

CSCI 635: Introduction to Machine Learning
Homework 3: Nonlinear Prediction

Instructions: The assignment is out of 100 points. Submit your assignment through the Dropbox in MyCourses as two files zipped into one container (named according to the convention
<your_last_name-hw2.zip). The individual files are:
1. A *.pdf file for the write-up, named a2.pdf. Furthermore, while your Python scripts
will allow you to save your plots to disk, please copy/insert the generated plots into your
document again mapped to the correct problem number. For answers to various questions
throughout, please copy the question(s) being asked and write your answer underneath (as
well as the problem number).
2. The *.py files for your answers/implementations to Questions 1a, 1b, 1c, and 2, named
q1a.py, q1b.py, q1c.py, and q2.py.
Data: In this assignment we will use the MNIST datasets which contains 28 × 28 images of
handwritten digits. The other datasets come from the last homework assignments (HW # 2).
Grade penalties will be applied (but not limited to) for:
• Not submitting the write-up as instructed above.
• Submitting code with incorrect file names.
• Using any machine-learning-specific library for your code, e.g., TensorFlow, PyTorch.
In this assignment, you will implement, in raw numpy/scipy, a nonlinear model, i.e., the multilayer
perceptron (MLP), to tackle the problems you applied your Maximum Entropy and naïve Bayes
models on in the last homework. Furthermore, you will design the MLP’s parameter update procedure
– backpropagation of errors – and optimize it with stochastic gradient descent (SGD). As in the last
homework, your focus will be on multiclass classification. Finally, you will also apply and train your
MLP to tackle the infamous MNIST dataset, which is a larger-scale, more challenging problem.
Problem #1: Learning a Multilayer Perceptron Model (60 points)
Problem #1a: The XOR Problem Revisted (30)
You will start by first revisiting the classical XOR problem you attempted to solve in the last homework
with maximum entropy. The data for XOR is in xor.dat (from last homework).
First implement a single hidden layer MLP (this means that Θ will have two sets of weights and
two bias vectors, Θ = {W1, b1, W2, b2}) and perform a gradient check it as you did in the last
assignment. You should observe agreement between your analytic solution and the numerical solution,
up to a reasonable amount of decimal points in order for this part of the problem to be counted correct.
Make sure you check your gradients before you proceed with the rest of this problem. Make sure the
program returns “CORRECT” for each parameter value, i.e., for any specific parameter θj in Θ, we
return “CORRECT” if: ∼ ∇θj < 1e − 4.
Once you have finished gradient-checking, fit a single hidden layer MLP to the XOR dataset. Record
what tried for your training process and any observations (such as any relating to the model’s ability
to fit the data) as well as your final accuracy. Code goes into a file you will name q1a.py. Contrast
your observations of this model’s function approximation ability with that of the maximum entropy
model you fit in the last homework.
CSCI635: Introduction to Machine Learning (A. G. Ororbia II, 2019), Rochester, USA.
Problem #1b: The Spiral Problem Revisited (15)
Now it is time to fit your MLP to a multi-class problem (generally defined as k 2, or beyond binary
classification). The spirals dataset, as mentioned in the last assignment, represents an excellent toy
problem for observing nonlinearity.
Instead of the XOR data, you will now load in the spiral dataset file, spiral_train.dat. Train
your MLP on the data and then plot its decision boundary (the process for this MLP would be the
same as for the maximum entropy model, much as you did in the last assignment). Please save and
update your answer document with the generated plot once you have fit the MLP model as best you
can to the data. Make sure you comment on your observations and describe any insights/steps taken
in tuning the model. You will certainly want to re-use the code as work through sub-problems to
minimize implementation bugs/errors. Do NOT forget to report your accuracy. Code goes into a file
you will name q1b.py.
Problem #1c: IRIS Revisited (15)
You will now fit the MLP to the IRIS dataset, which has a separate validation set in addition to the
training set. You will have two quantities to track – the training loss and the validation loss.
Fit/tune your maximum entropy model to the IRIS dataset, iris_train.dat. Note that since you
are concerned with out-of-sample performance, you must estimate generalization accuracy by using
the validation/development dataset, /problems/HW2/data/iris_test.dat. Code goes into
a file you will name q1c.py.
Note that for this particular problem, you will want to make sure your parameter optimization
operates with mini-batches instead of the full batch of data to calculate gradients (in case you did
this for the previous sub-problems). As a result, you will have to write your own mini-batch creation
function, i.e., a create_mini_batch(·) sort of routine, where you draw randomly without replacement
M data vectors from X (and their respective labels from y). This will be necessary in order to write a
useful/successful training loop.
Besides recording your accuracy for both your training and development sets, track your loss as a
function of epoch. Create a plot with both of these curves superimposed. Furthermore, consider
the following questions: What ultimately happens as you train your model for more epochs? What
phenomenon are you observing and why does this happen? What is a way to control for this?
Problem #2: Image Categorization with MLPs (40 points)
Adapt your MLP’s softmax output layer by making the outputs correspond to a probability distribution
over the 10 object classes. Then, train the network using the MNIST training set. Use the crossentropy loss metric, and use (mini-batch) stochastic gradient descent to train your network. You will
need to experiment with different initializations and learning rates.
Training: Use a validation set: partition the training samples from each class into 80% training and
20% validation. The 80% will be used to tune network weights. The validation set will be used to
check generalization at each epoch. You should select a fixed number of epochs to run. Record where
error on the validation set increases, and then use the weights in the epoch before this overfitting
occurs as your final network weights.
Once trained, run your network on the MNIST test set, and compute the recognition rate for each
individual class, and for all classes using your trained classifier. In the write-up, provide the following:
• a learning curve plot showing the cross-entropy vs. epoch for both the training set and the
validation set, and use a line to show the epoch where overfitting occurs
• a histogram showing the accuracy for all classes, and for each individual digit class
• a table containing 1 example of 1 correct and 1 incorrect test sample from each digit class.
Comment on the accuracies, success & failure examples & the learning curve; two paragraphs should
be sufficient.
Code: Provide a main program q2.py, along with any supporting .py files in a zip, and a file containing
the saved model weights (see https://pytorch.org/docs/stable/notes/serialization.html).
2

More products