1
Programming Exercise 5:
Neural Networks Learning
Machine Learning
Introduction
In this exercise, you will implement the backpropagation algorithm for neural
networks and apply it to the task of hand-written digit recognition. To get started
with the exercise, you will need to download the starter code and unzip its contents
to the directory where you wish to complete the exercise. If needed, use the cd
command in Octave/MATLAB to change to this directory before starting this
exercise.
You can log into your CougarNet and download MATLAB from this website:
https://uh.edu/software-downloads/index.php.
Files included in this exercise
ex5.m - Octave/MATLAB script that steps you through the exercise
ex5data1.mat - Training set of hand-written digits
ex5weights.mat - Neural network parameters for exercise
displayData.m - Function to help visualize the dataset
fmincg.m - Function minimization routine (similar to fminunc)
sigmoid.m - Sigmoid function
computeNumericalGradient.m - Numerically compute gradients
checkNNGradients.m - Function to help check your gradients
debugInitializeWeights.m - Function for initializing weights
predict.m - Neural network prediction function
[y] sigmoidGradient.m - Compute the gradient of the sigmoid function
[y] randInitializeWeights.m - Randomly initialize weights
[y] nnCostFunction.m - Neural network cost function
y indicates files you will need to complete
Files needed to be submit
[1] ML_ex5 – Include all the code (You need to complete sigmoidGradient.m, randInitializeWeights.m and nnCostFunction.m by yourself)
2
[2] ex5_report – Directly give the answers of three questions:
(1) Feedforward cost without regularization
(2) Feedforward cost with regularization (lambda = 1)
(3) Sigmoid gradient evaluated at [1 -0.5 0 0.5 1]
(4) Train without regularization, predict the labels of the training set,
get the training set accuracy (Default MaxIter = 50)
(5) (Optional) Train with regularization (lambda = 1), predict the
labels of the training set, get the training set accuracy
Throughout the exercise, you will be using the scripts ex5.m. Thisscriptset upthe
datasetforthe problems andmake callsto functions that you will write. You do not
need to modify these scripts. You are only required to modify other functions, by
following the instructions in this assignment.
Where to get help
The exercises in this course use Octave1 or MATLAB, a high-level programming
language well-suited for numerical computations.
At the Octave/MATLAB command line, typing help followed by a function
name displays documentation for a built-in function. For example, help plot will
bring up help information for plotting. Further documentation for Octave functions can be found at the Octave documentation pages. MAT- LAB documenttation can be found at the MATLAB documentation pages.
Do not look at anysource codewritten by others or share your source code with
others.
1 Neural Networks
In the previous exercise, you implemented feedforward propagation for neural
networks and used it to predict handwritten digits with the weights we provided.
In this exercise, you will implement the backpropagation algorithm to learn the
parameters for the neural network.
The provided script, ex5.m, will help you step through this exercise.
1.1 Visualizing the data
In the first part of ex5.m, the code will load the data and display it on a 2-
1 Octave is a free alternative to MATLAB. For the programming exercises, you are free
to use either Octave or MATLAB.
3
dimensional plot (Figure1) by calling the function displayData.
Figure 1: Examples from the dataset
This is the same dataset that you used in the previous exercise. There are 5000
training examples in ex3data1.mat, where each training example is a 20 pixel by
20 pixel grayscale image of the digit. Each pixel isrepresented by a floating point
number indicating the grayscale intensity at that location. The 20 by 20 grid of
pixelsis “unrolled” into a 400-dimensional vector. Each of these training examples
becomes a single row in our data matrix X. This gives us a 5000 by 400 matrix X
where every row is a training example for a handwritten digit image.
The second part of the training set is a 5000-dimensional vector y that contains
labels for the training set. To make things more compatible with Octave/MATLAB
indexing, where there is no zero index, we have mapped the digit zero to the value
ten. Therefore, a “0” digit islabeled as “10”, while the digits “1” to “9” are labeled
as “1” to “9” in their natural order.
1.2 Model representation
Our neural network is shown in Figure2. It has 3 layers – an input layer, a hidden
layer and an output layer. Recall that our inputs are pixel values of digit images.
Since the images are of size 20 ´ 20, this gives us 400 input layer units (not
counting the extra bias unit which always outputs +1). The training data will be
4
loaded into the variables X and y by the ex5.mscript.
You have been provided with a set of network parameters (Θ(1), Θ(2)) already
trained by us. These are stored in ex5weights.mat and will be loaded by ex5.m
into Theta1 and Theta2. The parameters have dimensions that are sized for a neural
network with 25 units in the second layer and 10 output units (corresponding to
the 10 digit classes).
Figure 2: Neural network model.
1.3 Feedforward and cost function
Now you will implement the cost function and gradient for the neural network.
First, complete the code in nnCostFunction.m to return the cost.
Recall that the cost function for the neural network (without regularization) is
%Load saved matrices from file
load('ex5data1.mat');
% The matrices Theta1 and Theta2 will now be in your workspace
% Theta1 has size 25 x 401
% Theta2 has size 10 x 26
5
where hθ(x(i)
) is computed as shown in the Figure2 and K = 10 is the total number
of possible labels. Note that hθ(x(i)
)k = a k (3) is the activation (output value) of the
k-th output unit. Also, recall that whereas the original labels (in the variable y)
were 1, 2, ..., 10, for the purpose of training a neural network, we need to recode
the labels as vectors containing only values 0 or 1, so that
For example, if x(i) is an image of the digit 5, then the corresponding y(i)
(that
you should use with the cost function)should be a 10-dimensional vector with y5 =
1, and the other elements equal to 0.
You should implement the feedforward computation that computes hθ(x(i)
) for
every example i and sum the cost over all examples. Your code should also work
for a dataset of any size, with any number of labels (you can assume that there
are always at least K ≥ 3 labels).
1.4 Regularized cost function
The cost function for neural networks with regularization is given by
Youcan assume that the neural networkwill only have 3 layers – an input layer, a
hidden layer and an output layer. However, your code should work for any number
of input units, hidden units and outputs units. While we have explicitly listed the
indices above for Θ(1) and Θ(2) for clarity, do note that your code shouldingeneral
workwithΘ(1)andΘ(2)of anysize.
Note that you should not be regularizing the terms that correspond to the bias.
For the matrices Theta1 and Theta2, this corresponds to the first column of each
Implementation Note: The matrix X contains the examples in rows. (i.e.,
X(i,:)’ is the i-th training example x(i)
, expressed as a n × 1 vector.) When you
complete the code in nnCostFunction.m, you will need to add the column of
1’s to the X matrix. The parameters for each unit in the neural network is
represented in Theta1 and Theta2 as one row. Specifically, the first row of
Theta1 corresponds to the first hidden unit in the second layer. You can use a
for-loop over the examples to compute the cost.
6
matrix. You should now add regularization to your cost function. Notice that you
can first compute the unregularized cost function J using your existing
nnCostFunction.m and then later add the cost for the regularization terms.
Once you are done, ex5.m will call your nnCostFunction using the loaded set
of parameters for Theta1 and Theta2, and λ = 1.
2 Backpropagation
In this part of the exercise, you will implement the backpropagation algorithm to
compute the gradient for the neural network cost function. You will need to
complete the nnCostFunction.m so that it returns an appropriate value for grad.
Once you have computed the gradient, you will be able to train the neural network
by minimizing the cost function J(Θ) using an advanced optimizer such as fmincg.
You will first implement the backpropagation algorithm to compute the
gradients for the parameters for the (unregularized) neural network. After you
have verified that your gradient computation for the unregularized case is correct,
you will implement the gradient for the regularized neural network.
2.1 Sigmoid gradient
To help you get started with this part of the exercise, you will first implement the
sigmoid gradient function. The gradient for the sigmoid function can be computed
as
where
When you are done, try testing a few values by calling sigmoidGradient(z) at the
Octave/MATLAB command line. For large values (both positive and negative) of
z, the gradient should be close to 0. When z = 0, the gradient should be exactly
0.25. Your code should also work with vectors and matrices. For a matrix, your
function should perform the sigmoid gradient function on every element.
2.2 Random initialization
When training neural networks, it is important to randomly initialize the parameters for symmetry breaking. One effective strategy for random initialization is
to randomly select values for Θ(l)
uniformly in the range [- einit , einit]. You should
use sinit = 0.12. This range of values ensures that the parameters are kept small and
makes the learning more efficient.
7
Your job is to complete randInitializeWeights.m to initialize the weights
for Θ; modify the file and fill in the following code:
2.3 Backpropagation
Figure 3: Backpropagation Updates.
Now, you will implement the backpropagation algorithm. Recall that the intuition
behind the backpropagation algorithm is as follows. Given a training example
(x(t),y(t)), we will first run a “forward pass” to compute all the activations
throughout the network, including the output value of the hypothesis hΘ(x). Then,
for each node j in layer l, we would like to compute an “error term” δj
(l) that
measures how much that node was “responsible” for any errors in our output.
For an output node, we can directly measure the difference between the
network’s activation and the true target value, and use that to define δj
(3) (since
layer 3 is the output layer). For the hidden units, you will compute δj
(l) based on a
weighted average of the error terms of the nodes in layer (l + 1).
In detail, here is the backpropagation algorithm (also depicted in Figure 3). You
should implement steps 1 to 4 in a loop that processes one example at a time.
Concretely, you should implement a for-loop for t = 1:m and place steps 1-4 below
inside the for-loop, with the tth iteration performing the calculation on the t-th
training example (x(t), y(t)). Step 5 will divide the accumulated gradients by m to
obtain the gradients for the neural network cost function.
% Randomly initialize the weights to small values
Epsilon_init = 0.12;
W = rand(L_out, 1 + L_in) * 2 * epsilon_init – epsilon_init;
8
1. Set the input layer’s values (a(1)) to the t-th training example x(t)
. Perform
a feedforward pass (Figure 2), computing the activations ( z(2), a(2), z(3), a(3)) for
layers 2 and 3. Note that you need to add a+1 term to ensure that the vectors
of activations for layers a(1) and a(2) also include the bias unit. In
Octave/MATLAB, if a 1 is a column vector, adding one corresponds to a_1 =
[1 ; a_ 1].
2. For each output unit k in layer 3 (the output layer), set
where yk ∈ {0, 1} indicates whether the current training example belongs to
class k (yk = 1), or if it belongs to a different class (yk = 0). You may find logical
arrays helpful for this task (explained in theprevious programming exercise).
3. For the hidden layer l = 2, set
4. Accumulate the gradient from this example using the following formula.
Note that you should skip or remove δ0
(2). In Octave/MATLAB, removing δ0
(2)
corresponds to delta_2 = delta_ 2(2: end).
5. Obtain the (unregularized) gradient for the neural network cost function by
dividing the accumulated gradients by 1/m :
After you have implemented the backpropagation algorithm, the script ex5.m
will proceed to run gradient checking on your implementation. The gradient check
will allow you to increase your confidence that your code is computing the
gradients correctly.
Implementation Note: You should implement the backpropagation algorithm
only after you have successfully completed the feedforward and cost functions.
While implementing the backpropagation algorithm, it is often useful to use
the size function to print out the sizes of the variables you are working with if
you run into dimension mismatch errors (“nonconformant arguments” errors
in Octave/MATLAB).
9
2.4 Gradient checking
In your neural network, you are minimizing the cost function J(Θ). To perform
gradient checking on your parameters, you can imagine “unrolling” the parameters
Θ(1),Θ(2) into a long vector θ. By doing so, you can think of the cost function being
J(θ) instead and use the following gradient checking procedure.
Suppose you have a function fi(θ) that purportedly computes ; you’d like
to check if fi is outputting correct derivative values.
So, θ(i+) is the same as θ, except its i-th element has been incremented by e.
Similarly, θ(i−)
isthe corresponding vector with the i-th element decreased by e. You
can now numerically verify fi(θ)’s correctness by checking, for each i, that:
The degree towhichthese twovaluesshould approximate each otherwill depend
on the details of J. But assuming e = 10−4
, you’ll usually find that the left- and
right-hand sides of the above will agree to at least 4 significant digits (and often
many more).
Wehaveimplemented the function tocompute thenumerical gradientfor you in
computeNumericalGradient.m. While you are not required to modify the file,
we highly encourage you to take a look at the code to understand how it works.
In the next step of ex5.m, it will run the provided function checkNNGradients.m which will create a small neural network and dataset that will be
used for checking your gradients. If your backpropagation implementation is
correct, you should see a relative difference that is less than 1e-9.
Practical Tip: When performing gradient checking, it is much more efficient
to use a small neural network with a relatively small number of input units and
hidden units, thus having a relatively small number of parameters. Each
dimension of θ requires two evaluations of the cost function and this can be
expensive. In the function checkNNGradients, our code creates a small
random model and dataset which is used with computeNumericalGradient
for gradient checking. Furthermore, after you are confident that your gradient
computations are correct, youshould turn off gradient checking before running
your learning algorithm.
10
2.5 Regularized Neural Networks
After you have successfully implemented the backpropagation algorithm, you will
add regularization to the gradient. To account for regularization, it turns out that
you can add this as an additional term after computing the gradients using
backpropagation.
Specifically, after you have computed ∆ij(l) using backpropagation, you should
add regularization using
Note that you should not be regularizing the first column of Θ(l) which is used for
the bias term. Furthermore, in the parameters Θij
(l)
, i is indexed starting from 1,
and j is indexed starting from 0. Thus,
Somewhat confusingly, indexing in Octave/MATLAB starts from 1 (for both i
and j), thus Theta1(2, 1) actually corresponds to Θ2,0(l) (i.e., the entry in the
second row, first column of the matrix Θ(1) shown above)
Now modify your code that computes grad in nnCostFunction to account for
regularization. After you are done, the ex5.m script will proceed to run gradient
checking on your implementation. If your code is correct, youshould expect to see a
relative difference that is less than 1e-9
.
2.6 Learning parameters using fmincg
After you have successfully implemented the neural network cost function and
gradient computation, the next step of the ex5.m script will use fmincg to learn a
good set parameters.
After the training completes, the ex5.m script will proceed to report the training
accuracy of your classifier by computing the percentage of examples it got correct.
It is possible to get higher training accuracies by training the neural network for
more iterations. We encourage you to try training the neural network for more
Practical Tip: Gradient checking works for any function where you are
computing the cost and the gradient. Concretely, you can use the same
computeNumericalGradient.m function to check if your gradient implementations for the other exercises are correct too (e.g., logistic regression’s
cost function).
11
iterations (e.g., set MaxIter to 400) and also vary the regularization parameter λ.
With the right learning settings, it is possible to get the neural network to perfectly
fit the training set.
3 Visualizing the hidden layer
One way to understand what your neural network is learning is to visualize what
the representations captured by the hidden units. Informally, given a particular
hidden unit, one way to visualize what it computes is to find an input x that will
cause it to activate (that is, to have an activation value (a(l)
) close to 1). For the
neural network you trained, notice that the i
th row of Θ(1) is a 401-dimensional
vector that represents the parameter for the i
th hidden unit. If we discard the bias
term, we get a 400 dimensional vector that represents the weights from each input
pixel to the hidden unit.
Thus, one way to visualize the “representation” captured by the hidden unit is
to reshape this 400 dimensional vector into a 20 ´ 20 image and display it. The
next step of ex5.m does this by using the displayData function and it will show
you an image (similar to Figure 4) with 25 units, each corresponding to one hidden
unit in the network.
In your trained network, you should find that the hidden units corresponds
roughly to detectors that look for strokes and other patterns inthe input.
Figure 4: Visualization of Hidden Units.
Submission and Grading
After completing this assignment, be sure to use the submitfunction to submit your
solutions to our servers. The following is a breakdown of how each part of this
exercise is scored.
12
Part Related code file Points
Feedforward cost without regularization
Feedforward cost with regularization
(lambda = 1)
nnCostFunction.m
nnCostFunction.m
30 points
10 points
Sigmoid gradient evaluated at [1 -0.5 0
0.5 1]
sigmoidGradient.m 10 points
Train without regularization, predict the
labels of the training set, get the training
set accuracy (Default MaxIter = 50)
nnCostFunction.m 50 points
Total Points 100 points
You are allowed to submit your solutions multiple times, and we will take only
the highest score into consideration.