$30
CS2035 - Assignment 2 -
Arrays and Efficiency
Introduction
This MATLAB assignment requires you to write six (6) MATLAB functions, each contained
in its own script file. The first three (avg1.m, avg2.m, avg3.m) compute the average values of
an input matrix, and the second three functions (stderr1.m, stderr2.m, stderr3.m) compute the
standard error values of an input matrix.
You will learn about:
• vectorized operations on matrices;
• just-in-time (JIT) compilation;
• writing your own vectorized operations;
• timing runtime execution of your code;
• plotting your data;
• how to automate tests on variable sized input.
This assignment is worth 11 2
3% of the course mark.
CS2035: Assignment 2 - 2018 2
Part I: Computing Averages Efficiently
For the first part of this assignment you will write several MATLAB functions that compute
the average of an input matrix in different ways.
First note how the built-in MATLAB function mean works for an m by n input matrix
A=rand(m,n). For example, if m=100 and n=5, we might obtain the output
A = rand(100,5);
mean(A)
ans =
0.5212 0.5810 0.4844 0.5070 0.4944
Thus, the mean function treats the input matrix as a sequence of 5 columns of length 100 and
computes the average of each column. Recall that for a set of m samples of a random variable
x, its average x (an estimate of the mean) is given by
x =
1
m
Xm
i=1
xi
.
You are to write three MATLAB functions that replicate this behaviour. The first function (avg1.m) should take an input matrix X, compute [m,n] = size(X), and then compute
the output vector avg using a pair of nested loops. The outer loop computes a single element
of avg, and the inner loop computes the sum of the elements of the corresponding column of
X needed to compute the given element of avg. For this function, the size of avg should increase with each iteration through the loop, i.e., do not allocate space for avg before starting
the loop. The intention here is that MATLAB will not invoke JIT compilation for this function.
Make sure that the output of your function avg1.m replicates the behaviour of mean. It
should compute the same vector of values that mean returns, and it should produce a single
value if the input is a (row or column) vector. You can ensure this latter behaviour by checking
to see if the input array X is a row vector, and if it is by then converting it into a column vector.
The second function (avg2.m) modifies avg1.m so that MATLAB’s JIT compilation will
be used. Make sure that space for any arrays is allocated before any loops begin and that any
CS2035: Assignment 2 - 2018 3
variables (including the loop indices!) are defined before the loops begin.
The third function (avg3.m) computes the same thing as your first two functions, but uses
vectorization. For this function you can use the built-in MATLAB function sum to compute
the average of the input matrix X. Note that, aside from the code from the previous functions
used to convert an input row vector to a column, this function can compute the average of a
matrix in one or two lines.
Part II: Computing Standard Errors Efficiently
For the second part of this assignment you will repeat what you did for the first part for standard error calculations rather than averages.
First note that the built-in MATLAB function std works in the same way as the mean
function for an input matrix, in that it will compute the standard error of each column of the
input matrix. For example, we might obtain the output
A = rand(100,5);
std(A)
ans =
0.2752 0.2669 0.2741 0.2861 0.2754
Thus, the functions you write should replicate this behaviour. Recall that for a set of m samples
of a random variable x, the standard error sx (estimate of the standard deviation) is given by
sx =
vuut
1
m − 1
Xm
i=1
(x − xi)
2
.
Note that for each of your functions, you will have to detect when the input is a row vector
and convert it to a column, just as for your functions from the previous part. You can reuse
the same code for this.
The first function you write for this part (stderr1.m) should take an input matrix X, compute
[m,n] = size(X), compute the average avg using your first average function avg1.m, followed
CS2035: Assignment 2 - 2018 4
by a pair of nested loops to compute the output vector stderr. The outer loop computes a
single element of stderr, and the inner loop computes the sum in the expression for sx above
for the corresponding column of X. The outer loop then completes by evaluating the remainder
of the expression for sx to compute the value of the given element of stderr. Once again,
ensure that stderr1 does not allocate space for stderr before starting the loop.
The second function for this part (stderr2.m) modifies stderr2.m so that MATLAB’s JIT
compilation will be invoked. Again, make sure that space for any arrays is allocated before any
loops begin and that any variables (including the loop indices!) are defined before the loops
begin. To make for a more interesting comparison later, use a vectorized computation for avg
(using either your avg3 or mean), i.e., do not use avg2 to compute the averages.
The third function (stderr3.m) for this part computes the same thing as your first two
functions for this part, but uses vectorization. For this function you can use the built-in
MATLAB functions sum and sqrt to compute the standard error of the input matrix X. Note
that, aside from the code from the previous functions used to convert an input row vector to
a column, this function can compute the standard error of a matrix in one or two lines. Use
either your avg3 or mean to compute the averages.
Part III: Program Validation
Write a test script file tests.m that shows that your functions compute the correct results by
comparing them to the result of the corresponding built-in MATLAB function.
This script should consist of two parts, each preceded by a section title, i.e., a commented
line starting with %% indicating the title of the section. First construct a matrix X=rand(100,6)
and a row vector y=rand(1,100). The first section computes the result of calling avg1, avg2,
and avg3 on both X and y and compares them to the output of mean(X) and mean(y) All four
function calls should produce the same result. Make sure not to suppress the output of your
function calls, and use fprintf or disp to label the output of each function call.
The second section computes the result of calling stderr1, stderr2 and stderr3 on the
same X and y and compares them to the output of std(X) and std(y). All four function calls
CS2035: Assignment 2 - 2018 5
should produce the same result. Again, label the output of each function call.
Part IV: Measuring Runtime Performance
For this part of the assignment you will write code to test the runtime performance of the
functions that you wrote. This will involve writing a script file called timing.m that will test
many runs of your code and plot the runtime for different sizes of input matrix.
You will build this file in a few stages. To begin with, start with the lines
m = 1000;
n = 1e4;
X = rand(m,n);
This will construct a random 1000 × 10000 matrix to test your functions with. Then, for each
of the standard error functions you wrote, write a code segment like
tic
stderr*(X);
elapsed = toc;
fprintf(’stderr* on %dx%d array: %f s\n’,m,n,elapsed);
to compute the execution time, where * gets replaced by the number of the function you are
testing. Then add one more line that computes and prints out the runtime for std(X). When
this is working, you should find that stderr1 is the slowest, and either stderr3 or std is the
fastest, with stderr2 in between. The results will vary when you run your script multiple times.
The second stage for building your timing script is to have your script time several runs of
each of your functions and print out only the average runtime (this is to give a fair comparison
of the algorithms). To accomplish this, add a line runs=10 to the beginning of your script file.
Then before each tic statement (four of them) start a for loop with index i running from 1 to
runs, and replace
elapsed = toc;
CS2035: Assignment 2 - 2018 6
with
elapsed(i) = toc;
This will store the runtime of each run in a vector elapsed. Then, in the fprintf statement, replace ‘elapsed’ with ‘mean(elapsed)’. When you have done this for all four functions
(stderr1, stderr2, stderr3 and std), your code should print out four lines as before, but now
with the average runtime of 10 runs of each function.
The next stage of building your timing script is to have your script test different sizes of
input matrix. To start this process, add the following two lines at the top of your script file:
pow = 4;
points = pow+1;
and replace the line reading
n = 1e4;
with
n = logspace(0,pow,points);
This will create a vector n, the entries of which are the number of columns of the input matrix
X on a given run. This way the number of columns increases by a factor of 10 each time the
functions are tested. Then add another for loop with loop index k, which starts before the
line that assigns the value to X and ends at the bottom. To get the loop to work, wherever
n appears within the loop you must change it to n(k) (there should be five of places where
n appears). Now when you run your script, it will test all four functions 5 times, i.e., pow+1
times, with exponentially increasing input size.
The last stage of building your timing script is to plot the results of the runtime tests. The
easiest way to do this is the following. Before the first for loop add a line
T = zeros(4,points);
that initializes a variable T to hold the timing data. It has 4 rows, one for each of the functions
you are testing, and points columns, one for each of the different input sizes you are testing.
To fill this array, you only need to add a line of the form
CS2035: Assignment 2 - 2018 7
T(*,k) = mean(elapsed);
before each fprintf statement, with * replaced by the numbers 1 through 4 (1 for stderr1, 2
for stderr2, etc.). Once you have done this, you can plot your results on a log-log scale simply
by calling loglog(n,T). Add a line at the end of your file to do this. Then annotate your plot
with a legend (including the four titles ’Variable Array Loops’, ’JITC’, ’Vectorized’ and
’MATLAB’), a title (’Runtime Comparison for Standard Error Algorithms’), and label the
x (’Number of Size 1000x1 Input Columns’) and y (’runtime’) axes.
Once you have all of this working, increase the value of pow to 5, which will give another
data point on your plot.
Submitting the Assignment
To prepare your assignment for submission, make a wrapper script a2.m that starts with the
code
%% CS2035B Assignment 2: Arrays and Efficiency
%% Identification
% Your Name:
% Your Student Number
and then has the following content:
• Include a section for each of your functions (6 of them) with a section heading of the form
%% FunctionName Source Code
followed by a line reading
% <includeFunctionName.m</include
where ‘FunctionName’ is replaced with the name of your function. This will include the
source code in your published output.
CS2035: Assignment 2 - 2018 8
• Include a section for each of your scripts (tests.m and timing.m). Recall that you can call
a script file simply with the name of the file. Make sure to add before each script call
a section heading (preceded with %%) to describe what the script does and a line
% <includeScriptName.m</include
to include the source code.
Note that when you run A2.m it may take some time to compute all of the runtimes for your
code, and that when you call publish on A2.m it will have to recompute the runtimes, so you
may wish to reduce the value of pow in timing.m while you are getting this set up. But make
sure to increase pow back to 5 before you submit!
To complete the assignment, make a new folder in your OWL Drop Box called A2 and place
in that folder your nine (9) MATLAB script files together with the file a2.pdf obtained by
running MATLAB’s Publish on a2.m. You must include the pdf output to receive full
marks for your assignment..