$30
CS 489/698 Neural Networks Assignment 4
Autoencoders and RNNs
What you need to get
• YOU_a4.ipynb: a Python notebook (hereafter called “the notebook”)
• Network.py: Module containing Network class
• mnist_loader.py: Module for reading in MNIST data
• mnist.pkl: MNIST data
• origin_of_species.txt: a text file
What you need to do
1. Autoencoder [13 marks total]
In this question, you will create an autoencoder and train it on the MNIST digits.
(a) [4 marks] Consider the cosine proximity loss function
C(~y,~t) = −
~y · ~t
?
k~y k k~t k
.
It is the negative of the cosine of the angle between ~y and ~t. Based on that loss function, we
can define cosine proximity cost as the expected loss,
E(Y, T) =
C
~y,~t
??
~y∈Y,~t∈T
.
Find a formula for the gradient of the cost function with respect to the output, ~y. That is, find
a formula for ∂E
∂~y . Simplify the formula as much as you can.
(b) [3 marks] Complete the function CosineProximity_p. It computes ∂E
∂~y for the entire batch.
See the function’s documentation for more details.
(c) [4 marks] Create a 3-layer autoencoder neural network and train it on 10,000 digits from the
MNIST dataset. Your network’s input should have 784 neurons, and its output layer should
have 784 neurons that use the identity activation function. The hidden layer should have only
50 logistic neurons. Use stochastic gradient descent to minimize the cosine proximity loss
function for at least 20 epochs, and a learning rate of 1. Batch size should be between 30 and
130. You should use the supplied Network class.
(d) [2 marks] Show that your hidden layer successfully encodes the digits by encoding and reconstructing at least one sample of each digit class (0 through 9).
c Jeff Orchard 2019 v1.2 (updated 12:06pm, March 19) Page
CS 489/698 Neural Networks Assignment 4
2. Backprop Through Time [10 marks total]
The figure on the right shows an RNN. Note that
s = Ux + W h + b
h = σ(s)
z = V h + c
y = σ(z)
Notice that we are using the mathematical convention of assuming
vectors are column-vectors by default.
For the following questions, assume you are given a dataset that
has many samples of sequences of inputs and output targets. Each
sequence consist of inputs x
i
, for i = 1, . . . , τ , that produces a sequence of network outputs y
i
, which you wish to match to a corresponding sequence of targets, t
i
. The cost function for such a sequence is,
E(y
1
, . . . , yτ
, t1
, . . . , tτ
) = Xτ
i=1
C(y
i
, ti
)
U
V
W
x
h
y
s
z
(a) [3 marks] Show that the gradient of the cost with respect to the weights V can be written
∂E
∂V =
Xτ
i=1
∂C
y
i
, ti
?
∂yi
σ
0
(z
i
)
!
(h
i
)
T
(b) [2 marks] Suppose you have computed ∂E
∂hi
for i = 1, . . . , τ . Show that
∂E
∂U =
Xτ
i=1
?
∂E
∂hi
σ
0
(s
i
)
?
(x
i
)
T
(c) [4 marks] Also, show that
∂E
∂W =
Xτ−1
i=1
?
∂E
∂hi+1 ? σ
0
(s
i+1)
?
(h
i
)
T
(d) [1 mark] Finally, show that
∂E
∂b =
Xτ
i=1
?
∂E
∂hi
σ
0
(s
i
)
?
c Jeff Orchard 2019 v1.2 (updated 12:06pm, March 19) Page
CS 489/698 Neural Networks Assignment 4
3. Recurrent Neural Network [14 marks total]
In this question, you will complete the Python implementation of backprop through time (BPTT)
for a simple recurrent neural network (RNN). The notebook contains a definition for the class RNN.
The class has a number of methods, including BPTT. However, BPTT is incomplete.
For training and testing, the notebook also reads in a corpus of text (a simplified version of On the
Origin of Species by Charles Darwin), along with the character set, and creates about 5000 training
samples. The notebook also creates a few utility functions that help convert between the various
formats for the data.
(a) [8 marks] Implement the function BPTT so that it computes the gradients of the loss with
respect to the connection weight matrices and the biases. Your code should work for different
values of seq_length (this is the same as τ in the lecture notes).
(b) [2 marks] Create an instance of the RNN class. The hidden layer should have 400 ReLU
neurons. The input to the network is a one-hot vector with 27 elements, one for each character
in our character set. The output layer also has 27 neurons, with a softmax activation function.
(c) [2 marks] Train the RNN for about 15 epochs. Use categorical cross entropy as a loss function
(see A2 Q2 for help with this). You can use a learning rate of 0.001, but might want to break
the training into 5-epoch segments, reducing the learning rate for each segment. Whatever
works.
(d) [2 marks] What fraction of the time does your RNN correctly guess the first letter that follows
the input? Write a small bit of Python code that counts how many times the next character is
correct, and express your answer as a percentage in a print statement.
What to submit
Your assignment submission should be a single jupyter notebook file, named (<WatIAM_a4.ipynb),
where <WatIAM is your UW WatIAM login ID (not your student number). The notebook must include
solutions to all the questions. Submit this file to Desire2Learn. You do not need to submit any of the
modules supplied for the assignment.
c Jeff Orchard 2019 v1.2 (updated 12:06pm, March 19) Page 3