Starting from:

$30

Exercise 4: Multi-class Classification and Neural Networks

1
Programming Exercise 4:
Multi-class Classification and Neural Networks
Machine Learning
Introduction
In this exercise, you will implement one-vs-all logistic regression and neural
networks to recognize hand-written digits. To get started with the exercise, you will
need to download the starter code and unzip its contentsto the directory where you
wish to complete the exercise. If needed, use the cd command in Octave/MATLAB
to change to this directory before starting this exercise.
You can log into your CougarNet and download MATLAB from this
website: https://uh.edu/software-downloads/index.php.
Files included in this exercise
ex4.m - Octave/MATLAB script that steps you through part 1
ex4nn.m - Octave/MATLAB script that steps you through part 2
ex4data1.mat - Training set of hand-written digits
ex4weights.mat - Initial weights for the neural network exercise
submit.m - Submission script that sends your solutions to our servers
displayData.m - Function to help visualize the dataset
fmincg.m - Function minimization routine (similar to fminunc)
sigmoid.m - Sigmoid function
lrCostFunction.m - Logistic regression cost function
[y] oneVsAll.m - Train a one-vs-all multi-class classifier
[y] predictOneVsAll.m - Predict using a one-vs-all multi-class classifier
[y] predict.m - Neural network prediction function
y indicates files you will need to complete
Files needed to be submit
[1] ML_ex4 – Include all the code (You need to complete oneVsAll.m,
predictOneVsAll.m and predict.m by yourself)
[2] ex4_report – Directly give the answers of three questions:
(1) Minimum cost found by fmincg function for all digits (1, 2, 3 ...
2
9, 0):
(2) Training set accuracy using one-vs-all logistic regression method
(3) Training set accuracy with neural networks
Throughout the exercise, you will be using the scripts ex4.m and ex4nn.m. This
script set up the dataset for the problems and make calls to functions that you will
write. You do not need to modify these scripts. You are only required to modify
other functions, by following the instructions in this assignment.
Where to get help
The exercises in this course use Octave1 or MATLAB, a high-level programming
language well-suited for numerical computations.
At the Octave/MATLAB command line, typing help followed by a function
name displays documentation for a built-in function. For example, help plot will
bring up help information for plotting. Further documentation for Octave functions can be found at the Octave documentation pages. MAT- LAB documenttation can be found at the MATLAB documentation pages.
Do not look at anysource codewritten by others or share your source code with
others.
1 Multi-class Classification
For this exercise, you will use logistic regression and neural networks to recognize
handwritten digits (from 0 to 9). Automated handwritten digit recognition is
widely used today -fromrecognizing zip codes(postal codes) on mail envelopes to
recognizing amounts written on bank checks. This exercise will show you how the
methods you’ve learned can be used for this classification task.
In the first part of the exercise, youwill extend your previousimplementation of
logistic regression and apply it to one-vs-all classification.
1.1 Dataset
You are given a data set in ex4data1.mat that contains 5000 training examples of
handwritten digits. The .mat format means that that the data has been saved in a
native Octave/MATLAB matrix format, instead of a text (ASCII) format like a
csv-file. These matrices can be read directly into your program by using the load
1 Octave is a free alternative to MATLAB. For the programming exercises, you are free
to use either Octave or MATLAB.
3
command. After loading, matrices of the correct dimensions and values will
appear in your program’s memory. The matrix will already be named, so you do
not need to assign names to them.
There are 5000 training examples in ex4data1.mat, where each training
example is a 20 pixel by 20 pixel grayscale image of the digit. Each pixel is
represented by a floating point number indicating the grayscale intensity at that
location. The 20 by 20 grid of pixels is “unrolled” into a 400-dimensional vector.
Each of these training examples becomes a single row in our data matrix X. This
gives us a 5000 by 400 matrix X where every row is a training example for a
handwritten digit image.
The second part of the training set is a 5000-dimensional vector ythat contains
labels for the training set. To make things more compatible with Octave/MATLAB
indexing, where there is no zero index, we have mapped the digit zero to the value
ten. Therefore, a “0” digit islabeled as “10”, while the digits “1” to “9” are labeled
as “1” to “9” in their natural order.
1.2 Implementation
You will begin by visualizing a subset of the training set. In Part 1 of ex4.m, the
code randomly selects 100 rows from X and passes those rows to the
Figure 1: Examples from the dataset
%Load saved matrices from file
load('ex4data1.mat');
%The matrices X and y will now be in your Octave environment
4
displayData function. This function maps each row to a 20 pixel by 20 pixel
grayscale image and displays the images together. We have provided the
displayData function, and you are encouraged to examine the code to see how it
works. After you run this step, you should see an image like Figure 1
1.3 Vectorizing Logistic Regression
You will be using multiple one-vs-all logistic regression models to build a multiclass classifier. Since there are 10 classes, you will need to train 10 separate
logistic regression classifiers. To make this training efficient, it is important to
ensure that your code is well vectorized. In this section, you will implement a
vectorized version of logistic regression that does not employ any for loops. You can
use your code in the last exercise as a starting point for this exercise.
1.3.1 Vectorizing the cost function
We will begin by writing a vectorized version of the cost function. Recall that in
(unregularized) logistic regression, the cost function is
To compute each element in the summation, we have to compute hθ(x(i)) for every
example i, where hθ(x(i)) = g(θT x(i)) and g(z) is the sigmoid function. It turns out
that we can compute this quickly for all our examples by using matrix
multiplication. Let us define X and θ as
Then, by computing the matrix product Xθ, we have
In the last equality, we used the fact that aTb = bTa if a and b are vectors. This
allows us to compute the products θTx(i) for all our examples i in one line of code.
Your job is to write the unregularized cost function in the file lrCostFunction.m Your implementation should use the strategy we presented above to
calculate θTx(i). You should also use a vectorized approach for the rest of the cost
function. A fully vectorized version of lrCostFunction.m should not contain any
loops.
(Hint: You might want to use the element-wise multiplication operation (.*) and
the sum operation sum when writing this function)
5
Your first step is to implement this function in sigmoid.m so it can be called by
the rest of your program. When you are finished, try testing a few values by
calling sigmoid(x) at the Octave/MATLAB command line. For large positive
values of x, the sigmoid should be close to 1, while for large negative values, the
sigmoid should be close to 0. Evaluating sigmoid(0) should give you exactly 0.5.
Your code should also work with vectors and matrices. For a matrix, your function
should perform the sigmoid function on every element.
1.3.2 Vectorizing the gradient
Recall that the gradient of the (unregularized) logistic regression cost is a vector
where the j
th element is defined as
To vectorize this operation over the dataset, we start by writing out all the
partial derivatives explicitly for all θj,
Note that x(i) is a vector, while (hθ(x(i)
) y(i)
) is a scalar (single number). To
understand the last step of the derivation, let βi = (hθ(x(i)
) y(i)
) and observe that:

where the values βi = (hθ(x(i)
) − y(i)
).
6
The expression above allows us to compute all the partial derivatives without any
loops. If you are comfortable with linear algebra, we encourage you to work
through the matrix multiplications above to convince yourself that the vectorized
version does the same computations. You should now implement Equation 1 to
compute the correct vectorized gradient. Once you are done, complete the
function lrCostFunction.m by implementing the gradient.
1.3.3 Vectorizing regularized logistic regression
After you have implemented vectorization for logistic regression, you will now add
regularization to the cost function. Recall that for regularized logistic regression,
the cost function is defined as
Note that you should not be regularizing θ0 which is used for the bias term.
Correspondingly, the partial derivative of regularized logistic regression cost
for θj is defined as
Now modify your code in lrCostFunction to account for regularization. Once
again, you should not put any loops into your code.
Debugging Tip: Vectorizing code can sometimes be tricky. One common strategy
for debugging is to print out the sizes of the matrices you are working with using
the size function. For example, given a data matrix X of size 100 × 20 (100
examples, 20 features) and θ, a vector with dimensions 20×1, you can observe that
Xθ is a valid multiplication operation, while Xθ is not. Furthermore, if you have a
non-vectorized version of your code, you can compare the output of your
vectorized code and non-vectorized code to make sure that they produce the same
outputs.
7
1.4 One-vs-all Classification
In this part of the exercise, you will implement one-vs-all classification by training
multiple regularized logistic regression classifiers, one for eachof the K classes in
our dataset (Figure 1). In the handwritten digits dataset, K = 10, but your code
should work for any value of K.
You should now complete the code in oneVsAll.m to train one classifier for
each class. In particular, your code should return all the classifier parameters in a
matrix Θ RK×(N +1), where each row of Θ corresponds to the learned logistic
regression parameters for one class. You can do this with a “for”-loop from 1 to
K, training each classifier independently.
Note that the y argument to this function is a vector of labels from 1 to 10,
where we have mapped the digit “0” to the label 10 (to avoid confusions with
indexing).
When training the classifier for class k ∈ {1,...,K}, you will want a mdimensional vector of labels y, where yj ∈ 0,1 indicates whether the j-th training
instance belongs to class k (yj = 1), or if it belongs to a different class (yj = 0). You
may find logical arrays helpful for this task.
Furthermore, you will be using fmincg for this exercise (instead of fminunc).
fmincg works similarly to fminunc, but is more more efficient for dealing with
a large number of parameters.
Octave/MATLAB Tip: When implementing the vectorization for regularized
logistic regression, youmightoftenwanttoonlysumandupdate certain elements
of θ. In Octave/MATLAB, you can index into the matrices to access and update
only certain elements. For example, A(:, 3:5) = B(:, 1:3) will replaces the
columns 3 to 5 of A with the columns 1 to 3fromB.One specialkeywordyou
can use in indexing isthe end keyword in indexing. This allows us to select
columns (or rows) until the end of the matrix. For example, A(:, 2:end) will
only return elementsfromthe 2nd tolast column ofA.Thus, you could use this
together withthe sumand .^ operationsto compute the sumof only the elements
you are interested in (e.g., sum(z(2:end).^2)). In the starter code, lrCostFunction.m, we have also provided hints on yet another possible method
computing the regularized gradient.
Octave/MATLAB Tip: Logical arrays in Octave/MATLAB are arrays which
contain binary (0 or 1) elements. In Octave/MATLAB, evaluating the
expression a == b for a vector a (of size m×1) and scalar b will return a vector
of the same size as a with ones at positions where the elements of a are equal
to b and zeroes where they are different. To see how this works for yourself,
try the following code in Octave/MATLAB
a = 1:10; % Create a and b
b = 3;
a == b % You should try different values of b here
8
After you have correctly completed the code for oneVsAll.m, the script ex4.m
will continue to use your oneVsAll function to train a multi-class classifier.
1.4.1 One-vs-all Prediction
After training your one-vs-all classifier, you can now use it to predict the digit
contained in a given image. For each input, you should compute the “probability”
that it belongs to each class using the trained logistic regression classifiers. Your
one-vs-all prediction function will pick the class for which the corresponding
logistic regression classifier outputs the highest probability and return the class
label (1, 2,..., or K) as the prediction for the input example.
You should now complete the code in predictOneVsAll.m to use the one-vs-all
classifier to make predictions.
Once you are done, ex4.m will call your predictOneVsAll function using the
learned value of Θ.
2 Neural Networks
In the previous part of this exercise, you implemented multi-classlogistic regression
to recognize handwritten digits. However, logistic regression cannot form more
complex hypotheses as it is only a linear classifier.
In this part of the exercise, you will implement a neural network to recognize
handwritten digits using the same training set as before. The neural network will be
able to represent complex models that form non-linear hypotheses. Forthis week,
you will be using parametersfrom a neural network that we have already trained.
Your goal is to implement the feedforward propagation algorithm to use our
weightsfor prediction. In next week’s exercise, you will write the backpropagation
algorithm for learning the neural network parameters.
The provided script, ex4nn.m, will help you step through this exercise.
2.1 Model representation
Our neural network is shown in Figure2. It has 3 layers – an input layer, a hidden
layer and an output layer. Recall that our inputs are pixel values of digit images.
Since the images are of size 20 ´ 20,this gives us 400inputlayer units (excluding
the extra bias unit which always outputs +1). As before, the training data will be
loaded into the variables X and y.
You have been provided with a set of network parameters (Θ(1), Θ(2)) already
trained by us. These are stored in ex4weights.mat and will be loaded by ex4nn.m
into Theta1 and Theta2 The parameters have dimensions that are sized for a neural
network with 25 units in the second layer and 10 output units (corresponding to
the 10 digit classes).
9
Figure 2: Neural network model.
2.2 Feedforward Propagation and Prediction
Now you will implement feedforward propagation for the neural network. You will
need to complete the code in predict.m to return the neural network’s prediction.
You should implement the feedforward computation that computes hθ(x(i)
) for
every example i and returns the associated predictions. Similar to the one-vs-all
classification strategy,the prediction fromthe neural networkwill be the label that
has the largest output (hθ(x))k.
% Load saved matrices from file
load('ex4weights.mat');
% The matrices Theta1 and Theta2 will now be in your Octave
% environment
% Theta1 has size 25 x 401
% Theta2 has size 10 x 26
10
Once you are done, ex4nn.m will call your predict function using the loaded set
of parameters for Theta1 and Theta2. After that, an interactive sequence will
launch displaying images from the training set one at a time, while the console
prints out the predicted label for the displayed image. To stop the image sequence,
press Ctrl-C.
Submission and Grading
After completing this assignment, be sure to use the submitfunction to submit your
solutions to our servers. The following is a breakdown of how each part of this
exercise is scored.
Part Related code file Points
Minimum cost found by fmincg function for
all digits (1, 2, 3 ... 9, 0)
oneVsAll.m 40 points
Training set accuracy using one-vs-all
logistic regression method
predictOneVsAll.m 30 points
NeuralNetworkPredictionFunction predict.m 40 points
Total Points 100 points
You are allowed to submit your solutions multiple times, and we will take only
the highest score into consideration.
Implementation Note: The matrixX containsthe examplesin rows. When you
complete the code in predict.m, you will need to add the column of 1’s to the
matrix. The matrices Theta1 and Theta2 contain the parameters for each unit
in rows. Specifically, the first row of Theta1 correspondsto the first hidden unit
in the second layer. In Octave/MATLAB, when you compute z(2) = Θ(1)a(1), be
sure that you index (and if necessary, transpose) X correctly so that you get a(l)
as a column vector.

More products