Starting from:

$29

Assignment 3 The Neural Network Package


General instructions.

2. You need to submit your source code (self contained, well documented and
with clear instruction for how to run) and a report via TEACH. In your
submission, please clearly indicate all your team members.
3. Your code will be tested on the designated server:
vm-cs434-1.engr.oregonstate.edu
Please make sure that your program can run on this server without any
issue.
4. Be sure to answer all the questions in your report. Your report should be
typed, submitted in the pdf format. You will be graded based on both
your code as well as the report. In particular, the clarity and quality of
the report will be worth 10 % of the pts. So please write your report in
clear and concise manner. Clearly label your figures, legends, and tables.
1 Multi-layer Perceptron (MLP) for CIFAR10
For this assignment, you will use the PyTorch neural network package to perform
a set of experiments on the famous CIFAR10 dataset and report your results
and findings together with discussions of the results.
The CIFAR10 Dataset. The data can be downloaded from
https://www.cs.toronto.edu/~kriz/cifar.html
This webpage contains all the information that is needed for unpacking and
using the data.
1
The Neural Network Package and sample code. You will not be implementing your own network training algorithm. Instead, you will use the neural
network package provided by PyTorch. The following page provides some example code for using pytorch for training a MLP on the MNIST data.
https://github.com/CSCfi/machine-learning-scripts/blob/master/notebooks/
pytorch-mnist-mlp.ipynb
Specifically, in this example,
• the MLP has 3-layer (2 hidden layer and 1 output layer), with 50 hidden
units in each hiden layer with ReLu as the activation function, and 10
output nodes in the output layer with softmax activation
• Loss function is negative loglikelihood (NLL loss) (similar to what is used
for Logistic regression, but for C = 10 classes)
• Training is performed by applying Stochastic Gradient Descent(SGD) to
minimize the loss with a learning rate of 0.01
• A drop out rate of 0.2 is applied to the first two layers, which will random
zero out 20% of the input to these layers. This is an effective technique
for regularization and preventing the co-adaptation of neurons.
For your experiments, you will create a multilayer perceptron neural network
and train it on the CIFAR10 dataset to predict object class based on the input
image. You can use the example code provided above for MNIST as the basis
and modify the necessary part to work with the CIFAR10 data. The main
difference is that the CIFAR10 image dimensions are 32 by 32, with 3 values for
each pixel (R, G, B) whereas MNIST is 28 by 28 with only a single greyscale
value per pixel. You will also want to normalize all values by 255 so that the
value will be between 0 and 1. We will use the same output layer, and the same
loss function, and the same optimizer for training.
1. (20 pts) For the first set of experiments, you will create and train a 2-layer
network with one hidden layer of 100 hidden nodes using sigmoid activation function. You will use the first four batches of images as training,
the remaining batch as validation. When training the network, you need
to monitor the loss (negative log loss) on the training data (similar to
what is shown in the Karpathy demo) as well as the error (or accuracy)
on the validation data, and plot them as a function of training epoches.
Train your network with the default parameters for drop-out, momentum, and weight decay, but experiment with different learning rates (e.g.,
0.1, 0.01, 0.001, 0.0001). What is a good learning rate that works for this
data and this network structure? Present your plots for different choices
of learning rates to help justify your final choice of the learning rate. How
do you decide when to stop training? Evaluate your final trained network
on the testing data and report its accuracy.
2. (15 pts) Repeat the same experiment as (1) but use Relu as the activation
function for the hidden layer. And report your results.
2
3. (25 pts) Experiment with other parameters: drop out, momentum and
weight decay. The goal is to improve the performance of the network
by changing these parameters. Please describe what you have tried for
each of these parameters. How do the choices influence the behavior of
learning? Does it change the convergence behavior of training? How
do they influence the testing performance? Please provide a summary
of the results and discuss the impact these parameters. Note that your
discussion/conclusion should be supported by experimental evidences like
test accuracy, training loss curve, validation error curves etc.
4. (30 pts) Now we will alter the structure of the network. In particular,
we will keep the same number of hidden nodes (100) but split them into
two hidden layers (50 each)1 Train the new network with the same loss
function, the SGD optimizer, and your choice of activation function for
the hidden layers. What do you observe in terms of training convergence
behavior? Do you find one structure to be easier to train than the other?
How about the final performance, which one gives you better testing performance? Provide a discussion of the results. Please provide necessary
plots and figures to support your discussion.
1
In case this does not produce anything significantly different, feel free to consider even
deeper alternatives.
3

More products