Homework 3 Stochastic Gradient Descent (SGD) optimizer

Your shopping cart is empty.

BME646 and ECE60146: Homework 3
1 Introduction
The main goal of this homework is for you to develop a greater appreciation for the step size optimization logic that you will use for training deep
neural networks. In the programming tasks, you will first run example
scripts demonstrating the effects of using a vanilla Stochastic Gradient Descent (SGD) optimizer and see its shortcomings. Subsequently, you will be
tasked to augment the vanilla SGD optimizer with momentum (SGD+) and
Adaptive Moment Estimation (Adam). For more information on the topics
covered in this HW, please refer to Prof. Kak’s slides on Autograd [1].
2 Becoming Familiar with the Primer
1. Download the tar.gz archive and install version 1.0.9 of your instructor’s ComputationalGraphPrimer. You may not want to sudo pip
install the Primer since that would not give you the Examples directory of the distribution that you are going to need for the homework.
Here is the main documentation page for the Primer:
https://engineering.purdue.edu/kak/distCGP/ComputationalGraphPrimer1.0.9.html
2. Go to the Examples directory of the distribution and execute the following scripts:
python3 one_neuron_classifier.py
python3 multi_neuron_classifier.py
The final output of both these scripts is a display of the training loss
versus the training iterations.
3. Now execute the following script in the Examples directory
1
python3 verify_with_torchnn.py
Unless you make changes to the script in the Examples directory, the
loss vs. iterations graph that you will see is for a network that is
a torch.nn version of the handcrafted network you get through the
script multi neuron classifier.py
Compare mentally the output you get with the above call with what
you saw for the second script in Step 2.
4. Now make a couple of changes to the file verify with torchnn.py
in order to see the torch.nn based output for the one-neuron model.
The changes you need to make are mentioned in the documentation
part of the file verify with torchnn.py.
Again compare mentally the loss-vs-iterations for the one-neuron case
with the handcrafted network vis-a-vis the torch.nn based network.
For both the one-neuron and the multi-neuron cases, you will see a
dramatic improvement in the performance with the torch.nn based
implementations of the network. A significant portion of this improvement can be attributed to the use of step optimization for the
torch.nn based code.
5. Now comes the hard part of this homework:
If you’d look at the code in Version 1.0.9 of the Primer, you will notice
that it does NOT use any step optimizations in SGD. In other words,
the update steps in the Primer are based solely on the current value
of the gradient of the loss with respect to the parameter in question.
That is,
pt+1 = pt − lr ∗ gt+1 (1)
where pt denotes learnable parameters from the previous time step,
e.g. layer weights at iteration t, and gt+1 denotes the corresponding
gradient for the current time step t + 1.
Your work for this homework consists of adding step-size optimization to training with the one-neuron and multi-neuron networks in the
CGP primer. In order to fully appreciate what that means, it is recommended that you carefully review the material in the section “Step
Size Optimization for SGD” in the Week 3 slides by your instructor
[1]. This section consists of the Slides 103 through 115. As you will
see in these slides, the two major components of step-size optimization are: (1) using momentum; and (2) adapting the step sizes to the
2
gradient values of the different parameters. (The latter is also referred
to as dealing with sparse gradients.) Both of these are incorporated in
what’s currently the world’s most popular step-size optimizer: Adam
(Adaptive Moment Estimation).
• SGD with Momentum (SGD+): In its simplest implementation, using momentum only involves remembering the step size
used at the previous iteration and then making the current stepsize decision based on the current value for the gradient and the
previous value for the step size. In order the invoke the notion
of momentum for step optimization, you have to compute the
step updates separately for individual learnable parameters. This
makes it more convenient to base the current step size on both
its previous value and the current value of the gradient, as shown
below:
vt+1 = µ ∗ vt + gt+1,
pt+1 = pt − lr ∗ vt+1.
(2)
In the formulas shown, the variable v is the step size and the
first equation is the recursive update formula for its update. As
you can see, for calculating the step size to use at the current
iteration t + 1, we use a fraction µ of its value at the previous
iteration. The momentum scalar µ ∈ [0, 1] decides the weight to
the previous time step update. The v0 is typically initialized with
all zeros.
• Adaptive Moment Estimation (Adam): During the last few
years, Adam has become one of the most widely used step-size
optimizers for SGD in deep learning owing to its efficiency and
robust performance especially on large datasets. The key idea
behind Adam is a joint estimation of the momentum term and
the gradient adaptation term in the calculation of the step sizes.
It does so by keeping the running averages of both the first and
second moment of the gradients, and take both moments into account for calculating the step size. The equations below demonstrate the key logic:
mt+1 = β1 ∗ mt + (1 − β1) ∗ gt+1,
vt+1 = β2 ∗ vt + (1 − β2) ∗ (gt+1)
2
,
pt+1 = pt − lr ∗
mˆ p
vˆt+1 + ϵ
,
(3)
3
where the definitions of the bias-corrected moments ˆm and ˆv can
be found on Slide 115 of [1]. In practice, β1 and β2, which control
the decay rates for the moments, are generally set to 0.9 and 0.99,
respectively.
3 Programming Task
• Your main programming task is two-fold: implementing SGD+ and
Adam based on the basic SGD you see in one_neuron_classifier.
py and multi_neuron_classifier.py.
As explained in Section 2, the Steps 1-4 are for you to become familiar
with Version 1.0.9 of the Primer. Prof. Kak’s slides on Autograd explain the basic logic of the implementation code for one_neuron_classifier
.py and multi_neuron_classifier.py. More specifically, your programming task is to create new versions of the one-neuron and multi
neuron-classifiers that are based on SGD+ as well as Adam.
• Note that for the implementation of both SGD+ and Adam, modifying the main module file ComputationalGraphPrimer.py is
NOT recommended. Instead, you should create subclasses that inherit the ComputationalGraphPrimer class provided by the module.
In your subclasses, create or override any class methods as your implementation requires. Also, it should be stressed that you are not
allowed to use PyTorch’s built-in SGD optimizer.
• Fig. 1 shows an example of the comparative plots from the one-neuron
classifier. This plot is shown just to give you an idea of the improvement achieved from SGD+ over SGD. Your results could vary based
on your choice of the parameters, such as learning rate, µ, batch size,
number of iterations, etc.
4 Submission Instructions
Include a typed report explaining how did you solve the given programming
tasks.
1. Your pdf must include a description of
• A description of both SGD+ and Adam in your own words with
key equations.
4
Figure 1: Sample comparative plot (SGD+ vs SGD) for the one-neuron
network. Your results could vary depending on your choice of the training
parameters. All the plot formatting related options are also flexible.
• For the one-neuron classifier, a plot of training loss vs iteration
comparing all three optimizers (SGD, SGD+, Adam). Another
of the same plot but with a different learning rate of your choice.
• The same comparative plots with two different learning rates for
multi-neuron.
• Discuss your findings comparing the performance of the three
optimizers in one or two paragraphs.
• Your source code. Make sure that your source code files are
adequately commented and cleaned up.
2. Turn in a zipped file, it should include (a) a typed self-contained pdf
report with source code and results and (b) source code files (only .py
files are accepted). Rename your .zip file as hw3 <First Name><Last
Name>.zip and follow the same file naming convention for your pdf
5
report too.
3. For all homeworks, you are encouraged to use .ipynb for development
and the report. If you use .ipynb, please convert it to .py and submit
that as source code.
4. You can resubmit a homework assignment as many times as you want
up to the deadline. Each submission will overwrite any previous
submission. If you are submitting late, do it only once on
BrightSpace. Otherwise, we cannot guarantee that your latest submission will be pulled for grading and will not accept related regrade
requests.
5. The sample solutions from previous years are for reference only. Your
code and final report must be your own work.
6. To help better provide feedbacks to you, make sure to number your
figures.
References
[1] Autograd for Automatic Differentiation and for Auto-Construction
of Computational Graphs. URL https://engineering.purdue.edu/
DeepLearn/pdf-kak/AutogradAndCGP.pdf.
6

Shopping cart

US$0

Homework 3 Stochastic Gradient Descent (SGD) optimizer

More products