$35
CS 446 / ECE 449 — Homework 3
Version 1.1
Instructions.
• Everyone must submit individually on Gradescope under hw3 and hw3code. Problem parts are marked
with [hw3] and [hw3code] to indicate where they are handed in.
• The “written” submission at hw3 must be typed, and submitted in any format Gradescope accepts
(to be safe, submit a PDF). You may use LATEX, Markdown, Google Docs, MS Word, whatever you like;
but it must be typed!
• When submitting at hw3, Gradescope will ask you to select pages for each problem; please do this
precisely!
• Please make sure your NetID is clear and large on the first page of the homework.
• Your solution must be written in your own words.
• Coding problems come with suggested “library routines”; we include these to reduce your time fishing
around APIs, but you are free to use other APIs.
• When submitting to hw3code, upload hw3.py. Don’t upload a zip file or additional files.
Version history.
1.0. Initial version.
1.1. Clarify to use SGD in Problem 1(c) and Problem 1(d).
1
1. ResNet.
In this problem, you will implement a simplified ResNet. You do not need to change arguments which are
not mentioned here (but you of course could try and see what happens).
(a) [hw3code] Implement a class Block, which is a building block of ResNet. It is described in Figure 2
of He et al. (2016), but also as follows.
The input to Block is of shape (N, C, H, W), where N denotes the batch size, C denotes the number
of channels, and H and W are the height and width of each channel. For each data example x with
shape (C, H, W), the output of block is
Block(x) = σr
x + f(x)
,
where σr denotes the ReLU activation, and f(x) also has shape (C, H, W) and thus can be added to
x. In detail, f contains the following layers.
i. A Conv2d with C input channels, C output channels, kernel size 3, stride 1, padding 1, and no
bias term.
ii. A BatchNorm2d with C features.
iii. A ReLU layer.
iv. Another Conv2d with the same arguments as i above.
v. Another BatchNorm2d with C features.
Because 3 × 3 kernels and padding 1 are used, the convolutional layers do not change the shape of
each channel. Moreover, the number of channels are also kept unchanged. Therefore f(x) does have
the same shape as x.
Additional instructions are given in doscstrings in hw3.py.
Library routines: torch.nn.Conv2d and torch.nn.BatchNorm2d.
Remark: Use bias=False for the Conv2d layers.
(b) [hw3code] Implement a (shallow) ResNet consists of the following parts:
i. A Conv2d with 1 input channel, C output channels, kernel size 3, stride 2, padding 1, and no
bias term.
ii. A BatchNorm2d with C features.
iii. A ReLU layer.
iv. A MaxPool2d with kernel size 2.
v. A Block with C channels.
vi. An AdaptiveAvgPool2d which for each channel takes the average of all elements.
vii. A Linear with C inputs and 10 outputs.
Additional instructions are given in doscstrings in hw3.py.
Library routines: torch.nn.Conv2d, torch.nn.BatchNorm2d, torch.nn.MaxPool2D,
torch.nn.AdaptiveAvgPool2d and torch.nn.Linear.
Remark: Use bias=False for the Conv2d layer.
(c) [hw3] Train your ResNet implemented in (b) with different choices C ∈ {1, 2, 4} on digits data and
draw the training error vs the test error curves. To make your life easier, we provide you with the
starter code to load the digits data and draw the figures with different choices for C. Therefore, you
only need to write the code to train your ResNet in function plot resnet loss 1(). Train your
algorithms for 4000 epochs using SGD with mini batch size 128 and step size 0.1. See the docstrings
in hw3.py for more details. Include the resulting plot in your written handin.
For full credit, in addition to including the six train and test curves, include at least one complete
sentence describing how the train and test error (and in particular their gap) change with C, which
itself corresponds to a notion of model complexity as discussed in lecture.
Library routines: torch.nn.CrossEntropyLoss, torch.autograd.backward, torch.no grad,
torch.optim.Optimizer.zero grad, torch.autograd.grad, torch.nn.Module.parameters.