Week 01 Lab Exercise Linked Lists, Performance

Your shopping cart is empty.

COMP2521 20T1 - Week 01 Lab Exercise

Marks
Demo
Submit
Deadline
COMP2521 20T1 Data Structures and Algorithms
Week 01 Lab Exercise
Linked Lists, Performance
Objectives
to re-acquaint you with C programming and ADTs
to manipulate a linked-list data structure
to learn about COMP2521 programming style
to learn (or remind yourself) about Makeles
to learn about shell scripts to automate repetitive tasks
to do some simple performance analysis
Admin
5=outstanding, 4=very good, 3=adequate, 2=sub-standard, 1=hopeless
in the Week 01 Lab or in the Week 02 Lab
give cs2521 lab01 IntList.c timing.txt or via WebCMS
submit by 5pm Sunday of Week-01
Background
At the end of COMP1511, you dealt with linked lists. Over the break, you haven't forgotten linked lists
(have you?), but a bit of revision never hurts, especially when many of the data structures we'll deal with
later are based on linked lists. So ... on with this simple linked list exercise ...
Setting Up
To keep your les manageable, it's worth doing each lab exercise in a separate directory (folder). I'd
suggest creating a subdirectory in your home directory called "cs2521", and then creating a
subdirectory under that called "labs", and then subdirectories "week01", "week02", etc. Let's assume
that the directory you set up for this lab is Week01LabDir.
Change into your Week01LabDir directory and run the following command:
$ unzip /web/cs2521/20T1/labs/week01/lab.zip
If you're working at home, download lab.zip by right-clicking on the above link and then run the above
command on your local machine.
In the example interactions, we assume that you are working at a Linux shell, and the shell is
giving you a $ prompt. All the text that you type is in monospace bold font and all the text
that the shell types at you is in monospace font.
If you've done the above correctly, you should now nd the following les in the directory:
3/26/2020 COMP2521 20T1 - Week 01 Lab Exercise
https://cgi.cse.unsw.edu.au/~cs2521/20T1/labs/week01/index.php 2/9
Makefile
IntList.h
IntList.c
useIntList.c
randList.c
timing.txt
a set of dependencies used to control compilation
interface denition for the IntList ADT
implementation for the IntList ADT
main program for testing the IntList ADT
main program for generating random sequences of numbers
template for your results le; you need to add more rows
Before you start using these programs, it's worth looking at the code. Are there any constructs that you
don't understand? Try to work them out with your lab partner, or ask your tutor.
Once you've understood the programs, the next thing to do is to run the command:
$ make
It's worth taking a look at the Makefile to see if you can work out what it's doing. Don't worry if you
don't understand it all; we'll be taking a longer look at make in later labs. Note: you will need to run make
to recompile the system each time you make changes to the source code le and are ready to test the
program again.
The make command will produce messages, which show the commands it is running, and will eventually
leave two executable les in your working directory (along with some .o les).
useIntList
This is the executable for the useIntList.c program. It reads a list of integers from standard
input, then attempts to make a sorted (ascending order) copy of the list, and then prints both the
original and the sorted lists. It doesn't work at the moment because the function to produce the
sorted list is incomplete.
randList
This is the executable for the randList.c program. It writes, on its standard output, a list of
random numbers in the range 1..999999. It takes one command-line argument to indicate how
many numbers it should write, and another optional argument to give a seed for the random
number generator. Note that it does not attempt to eliminate any duplicate values produced; if
you generate a large enough number of values, duplicates are inevitable.
You can run the useIntList command by typing the command name, followed by return. It will then
patiently wait for you to type some numbers, and will store them in a list, display the list, then call the
function that is supposed to sort the list, and then display the "sorted" list. Of course, this doesn't
currently work, because the list sorting function is incomplete. You can, however, fake it by typing the
numbers in in order, e.g.
$ ./useIntList -v
1 2 3 4
... type control-D to finish your input
Original:
1
2
3/26/2020 COMP2521 20T1 - Week 01 Lab Exercise
https://cgi.cse.unsw.edu.au/~cs2521/20T1/labs/week01/index.php 3/9
3
4
Sorted:
1
2
3
4
When useIntList eventually works properly, this is the kind of output you'll expect to see ... except
that you won't be giving it sorted lists as input. To see the existing behaviour on an unsorted list, type a
few numbers not sorted in ascending order, e.g.
$ ./useIntList -v
1 3 2
once again, type control-D to finish your input
Original:
1
3
2
Sorted
1
3
2
useIntList: useIntList.c:20: main: Assertion `IntListIsSorted(myOtherList)' failed.
Aborted
If you omit the -v command-line parameter, the useIntList program only displays the nal sorted list
(which is not yet sorted, of course).
$ ./useIntList
1 3 2
once again, type control-D to finish your input
1
3
2
useIntList: useIntList.c:20: main: Assertion `IntListIsSorted(myOtherList)' failed.
Aborted
Now it's time to play with the list generator. If you execute the command:
$ ./randList 10
it should display a list of 10 random numbers. If you run it again, you'll get a dierent list of random
numbers. Enjoy generating small lists of random numbers until you are bored.
If you then execute the command:
$ ./randList 10 | ./useIntList
3/26/2020 COMP2521 20T1 - Week 01 Lab Exercise
https://cgi.cse.unsw.edu.au/~cs2521/20T1/labs/week01/index.php 4/9
the randList command will generate 10 random numbers and give them to the useIntList command
which will print the list of numbers twice (separated by the word Sorted) and then fail the assertion as
above.
If you're not familiar with Unix/Linux conventions, the | is a "pipe" that connects two
commands, e.g. C1|C2. The standard output (stdout) of the command C1 will be directly
connected to the standard input (stdin) of the command C2, so that whatever C1 writes, C2
reads.
You can adjust the number of numbers that randList generates via the command-line argument, e.g.
$ ./randList 100 | less
will produce 100 random numbers. If you want to change the range of numbers produced (e.g. to only
make numbers in the range 1..10) edit the randList.c code and add extra command-line arguments to
set the range.
If you supply only one command-line argument, randList will generate a completely random (well,
pseudo-random) sequence each time you run it. If you want to generate a large sequence of pseudorandom numbers, and be able to generate the same sequence consistently, you can use the second
command-line argument to specify a seed for the random number generator. If you give the same seed
each time you run randList, you'll get the same sequence each time. To see the dierence between
using and not using the second command-line argument, try the following commands:
$ ./randList 10
$ ./randList 10
$ ./randList 10
$ ./randList 10 13
$ ./randList 10 13
$ ./randList 10 13
The rst three commands will generate dierent sequences. The second three commands all generate
the same sequence.
While ./randList is useful for producing large amounts of data, you don't need to use ./randList to
produce input to ./useIntList. An alternative is to use the echo command and enter the numbers
yourself, e.g.
$ echo 5 4 3 2 1 | ./useIntList
Another alternative, which will be more useful for testing, is to use the seq command, in combination
with the sort command, e.g.
$ seq 10
# gives 1 2 3 4 5 6 7 8 9 10
$ seq 10 | sort -n
# gives 1 2 3 4 5 6 7 8 9 10
$ seq 10 | sort -nr
3/26/2020 COMP2521 20T1 - Week 01 Lab Exercise
https://cgi.cse.unsw.edu.au/~cs2521/20T1/labs/week01/index.php 5/9
# gives 10 9 8 7 6 5 4 3 2 1
$ seq 10 | sort -R
# gives 1..10 in a random order
The -nr argument to the sort command tells it to treat the input as numbers and sort them in reverse
order. The -R argument to the sort command tells it to put the input in random order. You can nd
more details about sort via the command man sort.
One thing to remember is that sort orders items lexically, not numerically, by default. The following
pipeline may not produce what you expect. Try it.
$ seq 10 | sort
Task 1
The IntListInsertInorder() function is incomplete. In fact, it's just a stub function that invokes the
IntListInsert() function, which inserts the number, but not in order. You should re-write the
IntListInsertInorder() function so that it takes an IntList and an integer and inserts the value into
the appropriate place in the list, so that the list always remains sorted (in ascending order). The
function should fail if the original list (before insertion) is not sorted. Don't forget to handle the cases of
(a) empty list, (b) smallest value, (c) largest value, (d) second-smallest value, (e) second-largest value, (f)
value somewhere in the middle. Why were these kinds of values chosen? Discuss with your partner and
propose test cases that test these and any other cases that you can think of.
Make a directory called tests, and place les containing your test cases in that directory with one test
case in each le. A useful name strategy is to call the test les 01, 02, etc. You could create test les in
various ways, e.g.
$ mkdir tests
$ seq 10 | sort -R tests/01
$ seq 10 | sort -nr tests/02
$ ./randList 10 11 tests/03
etc. etc. etc.
In order to check that your program is producing the correct results, you could compare it to the output
of a known correct sorting program. The diff command can help here (see man diff for details). For
example, you could put each test case in a le, then run both your useIntList program and the built-in
sort command, save the results of each output in a le, and then compare the les using diff. If your
program is correct, there should be no dierence. The following shows an example of how to do this:
$ sort -n < tests/01 tests/01.expected # generate correct result
$ ./useIntList < tests/01 tests/01.observed # generate *your* result
$ diff tests/01.expected tests/01.observed # if correct, no output
If you produce a decent number of tests (10-20), as you should, then testing them one by one using the
above is a bit tedious. You could simplify carrying out the testing by writing a small shell script like:
3/26/2020 COMP2521 20T1 - Week 01 Lab Exercise
https://cgi.cse.unsw.edu.au/~cs2521/20T1/labs/week01/index.php 6/9
#!/bin/sh
for t in 01 02 03 04 05 ... and the rest of your test files
do
echo === Test $t ===
sort -n < tests/$t tests/$t.expected
./useIntList < tests/$t tests/$t.observed
diff tests/$t.expected tests/$t.observed
done
rm tests/*.expected tests/*.observed
If you put the above in a le called e.g. run_tests, and then run the command:
$ sh run_tests
the script will run all your test cases and any that fail will either produce an assertion message or show
you the dierence between your output and the expected output.
You will have some condence that your IntListInsertInorder() function is working properly when,
for all test cases, the assertions in useIntList.c no longer fail.
Task 2
Once IntListInsertInorder() is working correctly, run some tests to compare the time taken by
useIntList with the time taken by the Unix sort command for the same task.
The rst thing to do is to check that both commands are producing the same result (otherwise, it's not
useful to compare them). Try the following commands:
$ ./randList 1000 nums
$ ./useIntList < nums out1
$ sort -n < nums out2
$ diff out1 out2
If the diff command gives no output, then the les have no dierence (i.e. the observed output is he
same as the expected output). If you try the above a number of times with dierent sets of numbers
each time and get the same result, then you can be more condent that both commands are producing
the same result. (However, you cannot be absolutely certain that they will always produce the same
result for any input. Why not?)
Note that you need to use the -n option to sort because the default sorting order is lexcial, not
numeric. If you don't use -n, then e.g. 111 is treated as being less than 12. (See man sort for details.)
The next thing to do is to devise a set of test cases and collect some timing data. There is a command
called time that will produce the timing data for you. Use it as follows:
$ ./randList 100000 nums
$ time ./useIntList < nums out1
3/26/2020 COMP2521 20T1 - Week 01 Lab Exercise
https://cgi.cse.unsw.edu.au/~cs2521/20T1/labs/week01/index.php 7/9
$ time sort -n < nums out2
As before, check that both commands are producing the same result. This time, however, you will also
see output that contains some timing information (note that the output format may vary depending on
which shell you're using):
user time time spent executing the code of the command
system time time spent executing system operations, such as input/output
elapsed/real time wall-clock time between when the command starts and nishes
The elapsed/real time is aected by the load on the machine, and so is not reliable. The system time
should be similar for both commands (same amount of input/output). The value that's most useful is
the user time, which represents the time that the program spent doing computation. Note that the
time will vary if you run the same command multiple times. This is because the timing uses sampling
and may sample the program at dierent points during the dierent executions. Collect results from a
number of executions on the same output and take an average.
You will need to use large numbers of values (large argument to randList) to observe any appreciable
dierence. Of course, if you use very large numbers of values, the nums le may become too large to
store in your directory. In this case, you could store nums under the /tmp directory:
$ ./randList 10000 /tmp/$USER.nums
$ time ./useIntList < /tmp/$USER.nums /tmp/$USER.out1
$ time sort -n < /tmp/$USER.nums /tmp/$USER.out2
$ diff /tmp/$USER.out1 /tmp/$USER.out2
It would also be worth taking your /tmp/$USER.nums le and modifying it to check the timings for
sorted and reverse sorted inputs:
$ sort -n /tmp/$USER.nums alreadySortedWithDuplicates
$ sort -nr /tmp/$USER.nums reverseSortedWithDuplicates
As well as lists produced by randList, it would be worth trying lists produced using the seq/sort
combinations mentioned above. These will be dierent to the lists produced by randList in that they
won't have any duplicate values. For example, you could generate data les like:
$ seq 10000 alreadySortedNoDuplicates
$ seq 10000 | sort -nr reverseSortedNoDuplicates
$ seq 10000 | sort -R randomOrderNoDuplicates
and use these in place of /tmp/$USER.nums to generate more timing data.
Once the data les get big enough, and once you've checked that both programs are producing the
same output for a wide selection of cases, you can avoid generating and comparing large output les by
running the timing commands as follows:
3/26/2020 COMP2521 20T1 - Week 01 Lab Exercise
https://cgi.cse.unsw.edu.au/~cs2521/20T1/labs/week01/index.php 8/9
$ time ./useIntList < /tmp/$USER.nums /dev/null
$ time sort -n < /tmp/$USER.nums /dev/null
The above commands do all of the computation that we want to measure, but send their very large
output to the Unix/Linux "data sink" (/dev/null), so it never takes up space on the le system.
If you do write results to les, don't forget to remove them after you've nished the lab.
You should take note of the timing output and build a collection of test results for dierent sizes/types
of input and determine roughly the point at which it becomes impractical to use useIntList as a sort
program (rather than using the Unix sort command).
You'll nd that the timing data for relatively small lists (e.g. up to 10000) doesn't show much dierence
between useIntList and sort, and shows time pretty close to zero. Try using list sizes such as 5000,
10000, 20000, 50000, 100000, and as high as you dare to go after that. Also, try varying the kinds of
input that you supply. Consider the cases for the order of the input: random, already sorted, reverse
sorted. Also, consider the proportion of duplicate values in the input, e.g. all distinct values, some
duplicates. Note that ./randList will most likely include duplicates as soon as you specify a relatively
large list size.
Put your results and some brief comments explaining the results, in a le called timing.txt. The le
should contain a table of results, with major rows dealing with a particular size of data, and sub-rows
dealing with the order of the input. You should have three columns: one to indicate how many runs of
each program were used to compute the average time-cost, one to give the average time-cost for
useIntList and the other to give the average time-cost for sort. The table should look something like
the following:
Input
Size
Initial
Order
Has
Duplicates
Number
of runs
Avg Time
for useIntList
Avg Time
for sort
5000 random no N T1sec T2sec
5000 sorted no N T1sec T2sec
5000 reverse no N T1sec T2sec
5000 random yes N T1sec T2sec
5000 sorted yes N T1sec T2sec
5000 reverse yes N T1sec T2sec
10000 random no N T1sec T2sec
10000 sorted no N T1sec T2sec
10000 reverse no N T1sec T2sec
10000 random yes N T1sec T2sec
10000 sorted yes N T1sec T2sec
10000 reverse yes N T1sec T2sec
etc. etc.
3/26/2020 COMP2521 20T1 - Week 01 Lab Exercise
https://cgi.cse.unsw.edu.au/~cs2521/20T1/labs/week01/index.php 9/9
Note that, given the variation in timing output, it's not worth considering more than two signicant
gures in your averages.
You should also write a short paragraph to explain any patterns that you notice in the timing results.
Don't just re-state the timing result in words; try to explain why they happened.
If you're looking for a challenge, and know how to write scripts, write e.g. a Linux shell script to
automate the testing for you, and maybe even produce the timing le for you.
Another interesting challenge would be to plot graphs using gnuplot, the R system (available as the
Linux command R, a single upper-case letter), Mathematica, Matlab, or a similar tool. I'd suggest
plotting a separate graph for each type (random, ascending, descending) for a range of values (e.g.
5000, 10000, 20000, 50000, ... as far as you can be bothered, given how long larger data les take to
sort). Alternatively, produce your timing.txt as a tab-separated le and load it into a spreadsheet,
from where you could also produce graphs.
If you're feeling brave, try to nd out how large input sort can deal with. However, you should try such
testing on a machine where you are the sole user. Also, you shouldn't try to store the data les; use
commands like:
$ seq SomeVeryLargeNumber | time sort -n /dev/null
$ seq SomeVeryLargeNumber | sort -nr | time sort -n /dev/null
$ seq SomeVeryLargeNumber | sort -R | time sort -n /dev/null
Submission
You need to submit two les: IntList.c and timing.txt. You can submit these via the command line
using give or you can submit them from within WebCMS. After submitting them (either in this lab class
or the next) show your tutor, who'll give you feedback on your coding style, your timing results, and
award a mark.
Have fun, jas
COMP2521 20T1: Data Structures and Algorithms is brought to you by
the School of Computer Science and Engineering at the University of New South Wales, Sydney.
For all enquiries, please email the class account at cs2521@cse.unsw.edu.au
CRICOS Provider 00098G

Shopping cart

US$0

Week 01 Lab Exercise Linked Lists, Performance

More products