$30
Intro to Image Understanding (CSC420)
Assignment 2
Total: 200 marks
General Instructions:
• You are allowed to work directly with one other person to discuss the questions. However, you are still expected to write the solutions/code/report in your own words; i.e.
no copying. If you choose to work with someone else, you must indicate this in your
assignment submission. For example, on the first line of your report file (after your
own name and information, and before starting your answer to Q1), you should have a
sentence that says: “In solving the questions in this assignment, I worked together with
my classmate [name & student number]. I confirm that I have written the solutions/-
code/report in my own words”.
• Your submission should be in the form of an electronic report (PDF), with the answers
to the specific questions (each question separately), and a presentation and discussion
of your results. For this, please submit a file named report.pdf to MarkUs directly.
• Submit documented codes that you have written to generate your results separately.
Please store all of those files in a folder called assignment2, zip the folder and then
submit the file assignment2.zip to MarkUs. You should include a README.txt
file (inside the folder) which details how to run the submitted codes.
• Do not worry if you realize you made a mistake after submitting your zip file; you can
submit multiple times on MarkUs.
Part I: Theoretical Problems (80 marks)
[Question 1] Activation Functions (10 marks)
Show that in a fully connected neural network with linear activation functions, the number of layers has effectively no impact on the network.
Hint: Express the output of a network as a function of its inputs and its weights of layers.
[Question 2] Back Propagation (15 marks)
Consider the neural network architecture shown in the Figure below. Nodes denoted by
1
xi are input variables, and ˆy is the output variable). The node Σ takes the sum of its inputs,
and σ denotes the logistic function
σ(x) = 1
1 + e
−x
.
Suppose the loss function used for training this neural network is L2 loss, i.e. L(y, yˆ) =
||y − yˆ||2
2
. Assume that the network has its weights set as:
(w1, w2, w3, w4, w5, w6) = (0.75, −0.63, 0.24, −1.7, 0.8, −0.2)
Given an input data point (x1, x2, x3, x4) = (0.9, −1.1, −0.3, 0.8) with true label of 0.5, compute the partial derivative ∂L
w3
, by using the back-propagation algorithm. Round all your
calculations to 4 decimal places.
Hint: The gradient of L2 loss function ||y − yˆ||2
2
is 2||y − yˆ||, and you do not need to
write codes for this question! You can do it by hand.
Σ σ
Σ σ
Σ σ ŷ
x1
x2
x3
x4
w1
w2
w3
w4
w5
w6
[Question 3] Convolutional Neural Network (15 marks)
In this problem, our goal is to estimate the computation overhead of CNNs by counting
the FLOPs (floating point operations). Consider a convolutional layer C followed by a max
pooling layer P. The input of layer C has 50 channels, each of which is of size 12×12. Layer
C has 20 filters, each of which is of size 4 × 4. The convolution padding is 1 and the stride is
2. Layer P performs max pooling over each of the C’s output feature maps, with 3 × 3 local
receptive fields, and stride 1.
Given scalar inputs x1, x2, ..., xn, we assume:
• A scalar multiplication xi
.xj accounts for one FLOP.
• A scalar addition xi + xj accounts for one FLOP.
2
• A max operation max(x1, x2, ..., xn) accounts for n − 1 FLOPs.
• All other operations do not account for FLOPs.
How many FLOPs layer C and P conduct in total during one forward pass, with and without
accounting for bias?
[Question 4] Trainable Parameters (15 marks)
The following CNN architecture is one of the most influential architectures that was presented in the 90s. Count the total number of trainable parameters in this network. Note
that the Gaussian connections in the output layer can be treated as a fully connected layer
similar to F6.
[Question 5] Logistic Activation Function (10 marks)
Show that for backpropagation in a neural network with logistic activation function, as long
as we have the output of the neurons, there is no need for the inputs.
Hint: Find the derivative of a neuron’s output with respect to its inputs.
[Question 6] Hyperbolic Tangent Activation Function (15 marks)
One alternative to the logistic activation function, is hyperbolic tangent function:
tanh(x) = 1 − e
−2x
1 + e
−2x
.
• (a) What is the output range for this function, and how it differs from the output range
of logistic function?
• (b) Show that its gradient can be formulated as a function of logistic function; and
• (c) When do we want to use each of these activation functions?
3
Part II: Implementation Tasks (120 marks)
In this question, the goal is to investigate the performance of a simple neural network in
classifying different letters. All the tasks should be implemented using only Python, NumPy,
and a deep learning package of your choice; e.g. PyTorch, TensorFlow, etc.
Task I - Data Preparation (15 marks):
The dataset that we will use in this assignment is a permuted version of notMNIST1
,
which contains 28 × 28 images of 10 letters (A to J) taken from different fonts. This dataset
has 18720 instances which you should divide them into 3 different sets: 15000 images for the
training set, 1000 instances for the validation set and the remaining part for the test set.
It is important to have balanced divisions for each of your classes to avoid any bias. Load
the data into proper data structures; e.g. if you are using PyTorch, load them into tensors.
Apply any required transformations to the images. Make sure to include enough justification
for your choice of preprocessing pipeline. You may want to come back to this step as you
go through the training. Remember that preprocessing may have enormous impact on the
training results. Include your code snippet for this part in the report.
For all the tasks that follow, use RELU as your activation function, and cross entropy for
your cost function. You should also apply a softmax layer to your output layer. To have an
estimation about the running time, training the following neural networks, will not take more
than an hour on an Intel core i7 CPU @1.73 GHz with 4GB RAM. That means you should be
able to quickly train and test the networks using your own personal machine, Google Colab,
or teach.cs machines.
Task II - Neural Network Training (25 marks):
Implement a simple neural network with one hidden layer and 1000 hidden units. Train
your neural network with the aforementioned characteristics. For training the network, you
are supposed to find a reasonable value for your learning rate. As we always do for picking
the best value for a hyperparameter (which learning rate is one of them), you should train
your neural network for different values of learning rate and choose the one that gives you the
least validation error. Trying 5 different values will be enough. You may also find it useful
to decay the weights as training procedure goes on. Plot the number of training, validation,
and test error vs. the number of epochs. Make sure to use early stopping for your training to
avoid overfitting. After stopping the procedure of training, what are the training, test, and
validation errors?
1http://yaroslavvb.blogspot.ca/2011/09/notmnist-dataset.html
4
Task III - Testing the Number of Hidden Units (25 marks):
Instead of using 1000 hidden units, train three different neural networks with 100, 500, and
1000 hidden units, respectively. Find the best validation error for each one. Choose the
model which gives you the best result, and then use it for classifying the test set. Include all
the validation errors and the final test error in your report. In one sentence, summarize your
observation about the effect of the number of hidden units in the final results.
Task IV - Testing the Number of Layers (25 marks):
For this task fix the number of hidden units to 1000. This time, train a neural network
with two hidden layers with the same number of hidden units. So each layer has 500 units.
Plot the number of training and validation errors vs. the number of epochs. What is the
final validation error when the training is done? Using the test set, compare this architecture
with the one-layer case.
Task V - Dropout (30 marks):
Dropout is a powerful technique to decrease the overfitting and enhance the overall performance of the neural network. Using the same architecture in Task 2, introduce dropout
on the hidden layer of the neural network (with rate 0.5) and train the network. You should
use the dropout only in the training procedure, and not in the forward propagation. You can
use any existing implementation for dropout; e.g. PyTorch has functions that incorporates
dropout in the training. Plot the training and validation error vs. the number of epochs.
Compare the results with the case that we do not have any dropout. In one sentence, summarize your observation about the effect of dropout.
5