Starting from:

$29.99

Programming Exercise 1: Linear Regression

Programming Exercise 1: Linear Regression

1 Simple octave function
The ?rst part of ex1.m gives you practice with Octave syntax and the home-
work submission process. In the ?le warmUpExercise.m, you will ?nd the
outline of an Octave function. Modify it to return a 5 x 5 identity matrix by
lling in the following code:
A = eye(5);
When you are ?nished, run ex1.m (assuming you are in the correct direc-
tory, type \ex1" at the Octave prompt) and you should see output similar
to the following:
1Octave is a free alternative to MATLAB. For the programming exercises, you are free
to use either Octave or MATLAB.
2
ans =
Diagonal Matrix
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
Now ex1.m will pause until you press any key, and then will run the code
for the next part of the assignment. If you wish to quit, typing ctrl-c will
stop the program in the middle of its run.
1.1 Submitting Solutions
After completing a part of the exercise, you can submit your solutions for
grading by typing submit at the Octave command line. The submission
script will prompt you for your username and password and ask you which
les you want to submit. You can obtain a submission password from the
website's \Programming Exercises" page.
You should now submit the warm up exercise.
You are allowed to submit your solutions multiple times, and we will take
only the highest score into consideration. To prevent rapid-?re guessing, the
system enforces a minimum of 5 minutes between submissions.
2 Linear regression with one variable
In this part of this exercise, you will implement linear regression with one
variable to predict pro?ts for a food truck. Suppose you are the CEO of a
restaurant franchise and are considering di?erent cities for opening a new
outlet. The chain already has trucks in various cities and you have data for
pro?ts and populations from the cities.
You would like to use this data to help you select which city to expand
to next.
3
The ?le ex1data1.txt contains the dataset for our linear regression prob-
lem. The ?rst column is the population of a city and the second column is
the pro?t of a food truck in that city. A negative value for pro?t indicates a
loss.
The ex1.m script has already been set up to load this data for you.
2.1 Plotting the Data
Before starting on any task, it is often useful to understand the data by
visualizing it. For this dataset, you can use a scatter plot to visualize the
data, since it has only two properties to plot (pro?t and population). (Many
other problems that you will encounter in real life are multi-dimensional and
can't be plotted on a 2-d plot.)
In ex1.m, the dataset is loaded from the data ?le into the variables X
and y:
data = load('ex1data1.txt'); % read comma separated data
X = data(:, 1); y = data(:, 2);
m = length(y); % number of training examples
Next, the script calls the plotData function to create a scatter plot of
the data. Your job is to complete plotData.m to draw the plot; modify the
le and ?ll in the following code:
plot(x, y, 'rx', 'MarkerSize', 10); % Plot the data
ylabel('Profit in $10,000s'); % Set the y􀀀axis label
xlabel('Population of City in 10,000s'); % Set the x􀀀axis label
Now, when you continue to run ex1.m, our end result should look like
Figure 1, with the same red \x" markers and axis labels.
To learn more about the plot command, you can type help plot at the
Octave command prompt or to search online for plotting documentation. (To
change the markers to red \x", we used the option `rx' together with the plot
command, i.e., plot(..,[your options here],.., `rx'); )
2.2 Gradient Descent
In this part, you will ?t the linear regression parameters ? to our dataset
using gradient descent.
4
4 6 8 10 12 14 16 18 20 22 24
−5
0
5
10
15
20
25
Profit in $10,000s
Population of City in 10,000s
Figure 1: Scatter plot of training data
2.2.1 Update Equations
The objective of linear regression is to minimize the cost function
J(?) =
1
2m
Xm
i=1
􀀀
h?(x(i)) 􀀀 y(i)?2
where the hypothesis h?(x) is given by the linear model
h?(x) = ?T x = ?0 + ?1x1
Recall that the parameters of your model are the ?j values. These are
the values you will adjust to minimize cost J(?). One way to do this is to
use the batch gradient descent algorithm. In batch gradient descent, each
iteration performs the update
?j := ?j 􀀀 ?
1
m
Xm
i=1
(h?(x(i)) 􀀀 y(i))x(i)
j (simultaneously update ?j for all j):
With each step of gradient descent, your parameters ?j come closer to the
5
optimal values that will achieve the lowest cost J(?).
Implementation Note: We store each example as a row in the the X
matrix in Octave. To take into account the intercept term (?0), we add
an additional ?rst column to X and set it to all ones. This allows us to
treat ?0 as simply another `feature'.
2.2.2 Implementation
In ex1.m, we have already set up the data for linear regression. In the
following lines, we add another dimension to our data to accommodate the
?0 intercept term. We also initialize the initial parameters to 0 and the
learning rate alpha to 0.01.
X = [ones(m, 1), data(:,1)]; % Add a column of ones to x
theta = zeros(2, 1); % initialize fitting parameters
iterations = 1500;
alpha = 0.01;
2.2.3 Computing the cost J(?)
As you perform gradient descent to learn minimize the cost function J(?),
it is helpful to monitor the convergence by computing the cost. In this
section, you will implement a function to calculate J(?) so you can check the
convergence of your gradient descent implementation.
Your next task is to complete the code in the ?le computeCost.m, which
is a function that computes J(?). As you are doing this, remember that the
variables X and y are not scalar values, but matrices whose rows represent
the examples from the training set.
Once you have completed the function, the next step in ex1.m will run
computeCost once using ? initialized to zeros, and you will see the cost
printed to the screen.
You should expect to see a cost of 32.07.
You should now submit \compute cost" for linear regression with one
variable.
6
2.2.4 Gradient descent
Next, you will implement gradient descent in the ?le gradientDescent.m.
The loop structure has been written for you, and you only need to supply
the updates to ? within each iteration.
As you program, make sure you understand what you are trying to opti-
mize and what is being updated. Keep in mind that the cost J(?) is parame-
terized by the vector ?, not X and y. That is, we minimize the value of J(?)
by changing the values of the vector ?, not by changing X or y. Refer to the
equations in this handout and to the video lectures if you are uncertain.
A good way to verify that gradient descent is working correctly is to look
at the value of J(?) and check that it is decreasing with each step. The
starter code for gradientDescent.m calls computeCost on every iteration
and prints the cost. Assuming you have implemented gradient descent and
computeCost correctly, your value of J(?) should never increase, and should
converge to a steady value by the end of the algorithm.
After you are ?nished, ex1.m will use your ?nal parameters to plot the
linear ?t. The result should look something like Figure 2:
Your ?nal values for ? will also be used to make predictions on pro?ts in
areas of 35,000 and 70,000 people. Note the way that the following lines in
ex1.m uses matrix multiplication, rather than explicit summation or loop-
ing, to calculate the predictions. This is an example of code vectorization in
Octave.
You should now submit gradient descent for linear regression with one
variable.
predict1 = [1, 3.5] * theta;
predict2 = [1, 7] * theta;
2.3 Debugging
Here are some things to keep in mind as you implement gradient descent:
ˆ Octave array indices start from one, not zero. If you're storing ?0 and
?1 in a vector called theta, the values will be theta(1) and theta(2).
ˆ If you are seeing many errors at runtime, inspect your matrix operations
to make sure that you're adding and multiplying matrices of compat-
ible dimensions. Printing the dimensions of variables with the size
command will help you debug.
7
4 6 8 10 12 14 16 18 20 22 24
−5
0
5
10
15
20
25
Profit in $10,000s
Population of City in 10,000s
Training data
Linear regression
Figure 2: Training data with linear regression ?t
ˆ By default, Octave interprets math operators to be matrix operators.
This is a common source of size incompatibility errors. If you don't want
matrix multiplication, you need to add the \dot" notation to specify this
to Octave. For example, A*B does a matrix multiply, while A.*B does
an element-wise multiplication.
2.4 Visualizing J(?)
To understand the cost function J(?) better, you will now plot the cost over
a 2-dimensional grid of ?0 and ?1 values. You will not need to code anything
new for this part, but you should understand how the code you have written
already is creating these images.
In the next step of ex1.m, there is code set up to calculate J(?) over a
grid of values using the computeCost function that you wrote.
8
% initialize J vals to a matrix of 0's
J vals = zeros(length(theta0 vals), length(theta1 vals));
% Fill out J vals
for i = 1:length(theta0 vals)
for j = 1:length(theta1 vals)
t = [theta0 vals(i); theta1 vals(j)];
J vals(i,j) = computeCost(x, y, t);
end
end
After these lines are executed, you will have a 2-D array of J(?) values.
The script ex1.m will then use these values to produce surface and contour
plots of J(?) using the surf and contour commands. The plots should look
something like Figure 3:
−10
−5
0
5
10
−1
0
1
2
3
4
0
100
200
300
400
500
600
700
800
q0
q1
(a) Surface
q0
q1 −
10 −
8

6

4

2
0
2
4
6
8
10
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
(b) Contour, showing minimum
Figure 3: Cost function J(?)
The purpose of these graphs is to show you that how J(?) varies with
changes in ?0 and ?1. The cost function J(?) is bowl-shaped and has a global
mininum. (This is easier to see in the contour plot than in the 3D surface
plot). This minimum is the optimal point for ?0 and ?1, and each step of
gradient descent moves closer to this point.
9
Extra Credit Exercises (optional)
If you have successfully completed the material above, congratulations! You
now understand linear regression and should able to start using it on your
own datasets.
For the rest of this programming exercise, we have included the following
optional extra credit exercises. These exercises will help you gain a deeper
understanding of the material, and if you are able to do so, we encourage
you to complete them as well.
3 Linear regression with multiple variables
In this part, you will implement linear regression with multiple variables to
predict the prices of houses. Suppose you are selling your house and you
want to know what a good market price would be. One way to do this is to
rst collect information on recent houses sold and make a model of housing
prices.
The ?le ex1data2.txt contains a training set of housing prices in Port-
land, Oregon. The ?rst column is the size of the house (in square feet), the
second column is the number of bedrooms, and the third column is the price
of the house.
The ex1 multi.m script has been set up to help you step through this
exercise.
3.1 Feature Normalization
The ex1 multi.m script will start by loading and displaying some values
from this dataset. By looking at the values, note that house sizes are about
1000 times the number of bedrooms. When features di?er by orders of mag-
nitude, ?rst performing feature scaling can make gradient descent converge
much more quickly.
Your task here is to complete the code in featureNormalize.m to
ˆ Subtract the mean value of each feature from the dataset.
ˆ After subtracting the mean, additionally scale (divide) the feature values
by their respective \standard deviations."
10
The standard deviation is a way of measuring how much variation there
is in the range of values of a particular feature (most data points will lie
within ?2 standard deviations of the mean); this is an alternative to taking
the range of values (max-min). In Octave, you can use the \std" function to
compute the standard deviation. For example, inside featureNormalize.m,
the quantity X(:,1) contains all the values of x1 (house sizes) in the training
set, so std(X(:,1)) computes the standard deviation of the house sizes.
At the time that featureNormalize.m is called, the extra column of 1's
corresponding to x0 = 1 has not yet been added to X (see ex1 multi.m for
details).
You will do this for all the features and your code should work with
datasets of all sizes (any number of features / examples). Note that each
column of the matrix X corresponds to one feature.
You should now submit feature normalization.
Implementation Note: When normalizing the features, it is important
to store the values used for normalization - the mean value and the stan-
dard deviation used for the computations. After learning the parameters
from the model, we often want to predict the prices of houses we have not
seen before. Given a new x value (living room area and number of bed-
rooms), we must ?rst normalize x using the mean and standard deviation
that we had previously computed from the training set.
3.2 Gradient Descent
Previously, you implemented gradient descent on a univariate regression
problem. The only di?erence now is that there is one more feature in the
matrix X. The hypothesis function and the batch gradient descent update
rule remain unchanged.
You should complete the code in computeCostMulti.m and gradientDescentMulti.m
to implement the cost function and gradient descent for linear regression with
multiple variables. If your code in the previous part (single variable) already
supports multiple variables, you can use it here too.
Make sure your code supports any number of features and is well-vectorized.
You can use `size(X, 2)' to ?nd out how many features are present in the
dataset.
You should now submit compute cost and gradient descent for linear re-
gression with multiple variables.
11
Implementation Note: In the multivariate case, the cost function can
also be written in the following vectorized form:
J(?) =
1
2m
(X? 􀀀 ~y)T (X? 􀀀 ~y)
where
X =
2
6664
| (x(1))T |
| (x(2))T |
...
| (x(m))T |
3
7775
~y =
2
6664
y(1)
y(2)
...
y(m)
3
7775
:
The vectorized version is e?cient when you're working with numerical
computing tools like Octave. If you are an expert with matrix operations,
you can prove to yourself that the two forms are equivalent.
3.2.1 Optional (ungraded) exercise: Selecting learning rates
In this part of the exercise, you will get to try out di?erent learning rates for
the dataset and ?nd a learning rate that converges quickly. You can change
the learning rate by modifying ex1 multi.m and changing the part of the
code that sets the learning rate.
The next phase in ex1 multi.m will call your gradientDescent.m func-
tion and run gradient descent for about 50 iterations at the chosen learning
rate. The function should also return the history of J(?) values in a vector
J. After the last iteration, the ex1 multi.m script plots the J values against
the number of the iterations.
If you picked a learning rate within a good range, your plot look similar
Figure 4. If your graph looks very di?erent, especially if your value of J(?)
increases or even blows up, adjust your learning rate and try again. We rec-
ommend trying values of the learning rate ? on a log-scale, at multiplicative
steps of about 3 times the previous value (i.e., 0.3, 0.1, 0.03, 0.01 and so on).
You may also want to adjust the number of iterations you are running if that
will help you see the overall trend in the curve.
12
Figure 4: Convergence of gradient descent with an appropriate learning rate
Implementation Note: If your learning rate is too large, J(?) can di-
verge and `blow up', resulting in values which are too large for computer
calculations. In these situations, Octave will tend to return NaNs. NaN
stands for `not a number' and is often caused by unde?ned operations
that involve 􀀀1 and +1.
Octave Tip: To compare how di?erent learning learning rates a?ect
convergence, it's helpful to plot J for several learning rates on the same
gure. In Octave, this can be done by performing gradient descent multi-
ple times with a `hold on' command between plots. Concretely, if you've
tried three di?erent values of alpha (you should probably try more values
than this) and stored the costs in J1, J2 and J3, you can use the following
commands to plot them on the same ?gure:
plot(1:50, J1(1:50), `b');
hold on;
plot(1:50, J2(1:50), `r');
plot(1:50, J3(1:50), `k');
The ?nal arguments `b', `r', and `k' specify di?erent colors for the
plots.
13
Notice the changes in the convergence curves as the learning rate changes.
With a small learning rate, you should ?nd that gradient descent takes a very
long time to converge to the optimal value. Conversely, with a large learning
rate, gradient descent might not converge or might even diverge!
Using the best learning rate that you found, run the ex1 multi.m script
to run gradient descent until convergence to ?nd the ?nal values of ?. Next,
use this value of ? to predict the price of a house with 1650 square feet and
3 bedrooms. You will use value later to check your implementation of the
normal equations. Don't forget to normalize your features when you make
this prediction!
You do not need to submit any solutions for these optional (ungraded)
exercises.
3.3 Normal Equations
In the lecture videos, you learned that the closed-form solution to linear
regression is
? =
􀀀
XTX
?􀀀1
XT ~y:
Using this formula does not require any feature scaling, and you will get
an exact solution in one calculation: there is no \loop until convergence" like
in gradient descent.
Complete the code in normalEqn.m to use the formula above to calcu-
late ?. Remember that while you don't need to scale your features, we still
need to add a column of 1's to the X matrix to have an intercept term (?0).
The code in ex1.m will add the column of 1's to X for you.
You should now submit the normal equations function.
Optional (ungraded) exercise: Now, once you have found ? using this
method, use it to make a price prediction for a 1650-square-foot house with
3 bedrooms. You should ?nd that gives the same predicted price as the value
you obtained using the model ?t with gradient descent (in Section 3.2.1).
14

More products