$30
Software Engineering 265
Software Development Methods
Assignment 4
Programming environment
For this assignment you must ensure your work executes correctly on the Linux machines
in ELW B238. You are free to do some of your programming on your own computers; if
you do this, give yourself a few days before the due date to iron out any bugs in the C
program you have uploaded to the lab machines.
Objectives of this assignment
Revisit the C programming language, this time using dynamic memory allocation.
Learn and implement Move-to-front (MTF) and Run-length encoding (RLE)
algorithms that are popularly used in the compression scheme.
Use git to track changes in your source code and annotate the evolution of your
solution with “messages” provided during commits.
Test your code with the built-in testing functions (No text input files).
Be prepared to justify your script’s handling of errors for this assignment.
Page 2 of 5
This assignment: sengencode.c
For this assignment you will write a C program that can perform move-to-front
encoding of its contents followed by run-length encoding. In essence, you will be
implementing run-length encoding using dynamic memory in C, the
implementation of MTF encoding algorithm has been given to you in order to
help you complete the rest of your C program.
Step 1: Understand Move-to-front coding
Move-to-Front (MTF) Coding is an adaptive coding scheme where frequentlyappearing symbols – for our assignment “symbols” equal “chars”, including such char
codes as \x03 and \x0a – are replaced with numbers that behave as indices. MTF acts
in a manner somewhat similar to the way we might use a vertical stack of books. If we
need a book from the middle of the pile, we retrieve it and when finished put the book
back on top of the pile.
The figure below shows a string of input symbols (“abcabcaaaaaaaab”), the output
generated by the encoding, and finally the order in which symbols appear in a list
after each input symbol is processed. Notice the cases when a symbol is moved from
a position within the word list to the top position of the list.
In this assignment, we will use an initial static large table from ASCII to index
up to 256 characters. However, in a more general case, you can improve the given
MTF algorithm to be adaptive to any new characters (e.g. characters from Unicode),
but this is NOT required in this assignment. For example, when a new character
appears in the input we can (a) output the code corresponding to the first unused
position in the list and (b) follow this with the character. If a character has already
appeared, then we find its position in the list, output that position, and the move the
character to the front of the list. A decoder can read the output and not only build the
list on the fly, but also use codes (i.e., the “1 2 3 3” sequence above) to retrieve a
character already in the list. (Codes represents the position of the char in the list
before it is moved to the top.)
Page 3 of 5
You may have noticed above that the first position in the list is numbered 1 instead of
0. There is a reason for this, and it is described in the next step of implementing RLE
algorithm.
Here is what it might look like to call mtf_encode() in a C program:
…
char s[] = "abcabcaaaaaaaab";
int len = strlen(s);
int *code = (int *) malloc(sizeof(int) * len);
mtf_encode(s, len, code);
print_mtf_encode(code);
…
--------------------------------------------------------
$ gcc mtfencode.c –o mtfencode
$ ./mtfencode
[1, 2, 3, 1, 2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 3]
Note:
The above result is based on the general case which uses a dynamic table to index
the characters. This is the reason why the letter ‘a’ has the index 1. However, in the
given sengencode.c file, you will see a larger index number for the letter ‘a’ because
the static size table was created from ASCII and ‘a’ is in the middle of the table.
You should notice the MTF encoding is reversible. Simply maintain the same initial
table and decode by replacing each index number in the encoded stream with the
letter at that index in the list.
Assuming we have an initial table as “abcdefghijklmnopqrstuvwxyz”. You take the
number "2" of the encoded block and look it up in the list, which results in "b". Then
move the "b" to front which results in (bacdef...). Then take the next "2", look it up in
the list, this results in "a", move the "a" to front… etc.
Page 4 of 5
Step 2: Implement Run-length Encoding and Decoding
Although MTF encoding does not reduce the size of data, run-length encoding (RLE)
can result in size reduction. In fact, it is one of the simplest forms of data compression.
Quite simply, RLE replaces repeatedly-occurring symbols (e.g., chars, or integers)
with a count of those symbols. If the symbol + count requires less memory to
represent than original sequence of repeated symbols, then the final data size is
reduced.
In principle RLE can be used with any long sequence of symbols, but for our
assignment we will only deal with long sequences of 1s. (Being able to get long
sequences of 1s was another story with applying the famous Burrows–Wheeler
transform, you can learn more details by searching online if you have time.)
The figure below shows the MTF encoding from our example with its RLE
equivalent.
The sequence of seven 1s in the MTF encoding are replaced in the RLE version with 0
followed by 7. That is, the code 0 is reserved to mean “repeated 1s” with the following
number being the length of the repeat. This replacement will only ever be applied to
sequences of 1s with a length of three or greater. (Now you should see why 0 is not
used as an index in the MTF encoding.)
Going backwards from RLE to MTF is straightforward (i.e., replace all 0 codes with
the right numbers of 1s). The three left-blank functions in the C program should be
your target in this assignment.
Exercises for this assignment
1. Write your program sengencode.c program in the A4/ directory within your
git repository.
2. The tester function run_length_test() is already given in the C program. You
can use it to test and debug your program. Refrain from writing the program
all at once, and budget time to anticipate for when “things go wrong”. There
are 5 built-in testing cases.
3. Use git add and git commit appropriately. While you are not required to use
git push during the development of your program, you must use git push in
order to submit your assignment.
Page 5 of 5
4. Reasonable run—time performance of your script is expected. None of the
test cases should take longer than 15 seconds to complete on a machine
in ELW B238.
What you must submit
A C program named sengencode.c should be added and committed within
your git repository as it will be your solution to Assignment #4. Ensure your
work is committed to your local repository and pushed to the remote
before the due date/time.
A text file named “error_handling.txt” which enumerates the errors that
are addressed by your submission.
Evaluation
Our grading scheme is relatively simple.
Requirement Marks
File “sengencode.c” compiles and runs without errors. 1
The program is clearly written and uses functions appropriately
(i.e., is well structured).
1
Errors have been enumerated and are either suitably handled or a
sensible response strategy has been suggested.
2
Code passes the RLE encoding in the 5 built-in test cases 3
Code passes the RLE decoding and recovers the original messages
in the 5 built-in test cases
3
Total 10