Starting from:

$29

Assignment 7 Huffman encoding


// ~ Overview ~ //

Huffman encoding is a loss-less compression technique that is used
widely in such things as JPEG compression and MP3 compression. It was
developed in the 1950s (!) by David Huffman, who was doing his PhD at
MIT. Huffman encoding is still used because it is simple and, under
certain plausible conditions, it is optimal, which means that someone
*proved* that there is no better way to compress data. (!)

Please read the entire attached PDF file from Spring 2013. This file
includes the relevant theory for Huffman encoding.

To complete this assignment, you must write ten functions, as listed
in "answer07.h".

// ~ Hard ~ //

THIS ASSIGNMENT IS HARD. DO NOT LEAVE IT TO THE LAST MINUTE.

// ~ Learning Goals ~ //

(1) Recursion
(2) Linked-lists
(3) Binary Trees
(4) Binary files
(5) Bit twiddling
(6) Memory management

// ~ Assignment Files ~ //

+ answer07.h: Header file contains the declarations that you must
implement. These functions must be implemented in a file called
"answer07.c".
+ treefun.c: This source file contains some auxiliary functions that
are useful for displaying trees, and linked-lists of trees. There is
also a function that decodes a Huffman file, once the header has been
read. This last function should be useful to look at, because it shows
an example of reading a file bit by bit.
+ treefun.h: Header file for treefun.c. There is no need to edit this
file.
+ example.c: Some example code that shows how to write a program that
reads in a Huffman encoded file (either text or binary header), and
prints information about it. This code cannot be run until after
implementation of the functions in "answer07.c"
+ read-header-example.text: This file shows how a Huffman encoding
tree is built step-by-step. Please see the "Hints" for more
information.
+ README: this file.
+ tester: for testing your assignment
+ figures: a folder of jpeg images that illustrates how a tree is
built.
+ inputs: a folder that contains 13 input testcases. Six files have
text headers, and seven files have binary headers. You will need to be
able to process all of these files without any errors, including
memory errors.
+ expected: a folder that shows the expected output for each input
file, assuming you ran the program in 'example.c'
+ output-tester: When you run the tester, it will save the output of
each testcase and the valgrind log for each testcase in separate files
inside of this directory. Please review these files if you fail a
testcase, so that you can recreate the problem yourself.

// ~ Submitting Your Assignment ~ //

You must submit one zip file to blackboard. This zip file must
contain:
(1) answer07.c

You create the zip file using the following command.

zip pa07.zip answer07.c

If your zip file does not meet the above specification, then you may
get zero for this assignment. YOU HAVE BEEN WARNED. Following the
instructions is *part* of getting the assignment correct. So please
follow the instructions.

// ~ Determining Your Mark ~ //

The tester program is there to ensure that you have followed the
instructions correctly.

Run the tester program as follows:

./tester


// ~ Hints ~ //

(*) Help, I don't know where to start?

Start by writing *and* test every function except for:

(1) void Stack_popPopCombinePush(Stack * stack);
(2) HuffNode * HuffTree_readTextHeader(FILE * fp);
(3) HuffNode * HuffTree_readBinaryHeader(FILE * fp);

There is no point attempting the last two functions until after you
are certain that you have everything else correct. Many of these
functions are extremely short and simple.

See the last hint for more information on (1) and (2).

--------------------------------------------------------------------------------

(*) What is the point of creating a struct "Stack" that only contains
a single pointer?

This is a subtle design issue that illustrates how *encapsulation* can
simplify code. The Stack interface is six very simple functions that
operate on a Stack pointer. The person who uses this code (okay
yourself) never needs to worry about the underlying
implementation. Simply use the six functions. When you modify the
stack, you never need to managed the pointer to the head element, or
consider the case when it is NULL. Simple. (!)

By encapsulating the Stack logic, your code will be simpler, and
easier to keep error free.

--------------------------------------------------------------------------------

(*) How do I read a byte or a bit from a binary file?

Firstly, if you haven't completed *and* tested
HuffTree_readTextHeader(...), then you are wasting your time. Do that
first.

A big hint for doing this is contained in "treefun.c". If you look at
the Huffman_applyTree(...) function, you will see an example of how to
read a file bit by bit.

Don't forget that a byte is just 8 bits. Therefore, if you want to
read a byte, you simply read 8 bits, and compose them into a byte.

--------------------------------------------------------------------------------

(*) Compose bits into a byte? This bit twiddling is confusing!

If you want to set a bit, use logical or: '|'
If you want to clear bits, use logical and: '&'
If you want to position a bit, use bit-shifts: '<<' and ''

So, for example, let:

unsigned char X = 0b10010110;

To set the smallest (last) bit:

printBits(X | 0b00000001); // 0b10010111

To clear the last four bits:

printBits(X & 0b11110000); // 0b10010000

To position a bit in the sixth position:

unsigned char six = 1 << 6;
printBits(six); // 0b01000000

Note that I've numbered the bits as follows:

// Bit# 76543210
// 0b01000000

The reason is that it follows the value of each bit.
For example, if bits "0, 1 and 5" are set, then the
value of the number is:

Bit# 76543210
0b00100011 = 2^5 + 2^1 + 2^0 = 32 + 2 + 1 = 35

To set the sixth bit:

// Bit# 76543210
// 0b10010111
// or 0b01000000
// =
// 0b11010110
printBits(X | six);

To clear every bit except for the sixth

// 0b10010111
// and 0b01000000
// =
// 0b00000000
printBits(X & six);

To test if the sixth bit is set:

if(X & six != 0) {
// The sixth bit was set.
} else {
// Was not set
}

The code for printBits is as follows:

/**
* Print the bits of X
* In the for loop, we
* (1) take the value of X and right shift it by 'ind' ((X ind)
* (2) clear out all bits except for the zeroth bit. & 0x01)
* 0x01 is hexidecimal for 1, and in binary is: 0b00000001
*/
void printBits(unsigned char X)
{
printf("0b");
int ind;
for(ind = 7; ind = 0; --ind)
printf("%d", ((X ind) & 0x01));
printf("\n");
}

--------------------------------------------------------------------------------

(*) Okay, I'm really, really, stuck on reading the text header, and
with pop-pop-combine-push(...)

There is no point reading this section until you have implemented the
first seven functions in answer07.h, and furthermore, that you have
read the pdf document, and consulted the class notes.

Please look at the file "read-header-example.text". It shows how the
stack and encoding tree change as the header is read for the file:
"inputs/01-tibbs.txt-huff". Your code should replicate exactly the
same stack of trees as it reads bytes from the header of this file.

The function "Stack_print(...)" is your friend.

More products