Computer Science 245
Spring 2009
Project 2:  Huffman Encoding
Due Friday, March 20th, 3:30 p.m.


For your second project, you will write a program that compresses and uncompresses files using Huffman coding.  To compress a file, your program will follow the following steps:
To uncompress a file, your program will follow the following steps:

If your program is called with the ``verbose'' flag (-v), you will also need to print some debugging information to standard out.  If your program is called with the ``force'' flag (-f), then the file will be compressed even if the compressed file would be larger than the original file.

File Compression

Reading input files

To read in the input files, you will use the provided TextFile class, which has the following methods:

Building Huffman Trees

Huffman trees are built from the leaves up.  See the visualizations for examples of building huffman trees.  The class notes for this project also have a thorough description of building Huffman trees. 

Building Huffman Tables

Once the Huffman tree has been built, we will need to use it to create the codes for each character.  We can do this by doing a traversal of the tree, keeping track of the path from the root to the current node.  When a leaf is reached, we store the code (that is, path from the root to that leaf) in out code table, at the index of the character stored at the leaf.

Checking File Sizes

Once you have built the tree & table, you can compute the sizes of the compressed and uncompressed files.

If the compressed file is smaller than the original file (or the code was called with the -f option), go ahead with the compression.   Otherwise, do not compress the file/

Printing Huffman Files

To assist in printing out compressed files, the class BinaryFile is provided, which has the following methods:


To print a Huffman tree to the output file, we merely do a preorder traversal of the tree, printing out all of the nodes in the tree.  We will need to encode which nodes are leaves, and which nodes are
interior nodes.  We can do this by:
The BinaryFile class has methods writeBit and writeChare to assist you.  You may use some other method of your choice for serializing trees if you wish, but make sure that your method does not require more space!

Encoding File

Once the Huffman codes have been created, and the Huffman tree has been written to the output file, we only need to go through the input file again, character by character, writing out the appropriate code for each character.  Don't forget to close the output file when you are done!

File Decompression

Reading Huffman Tree

To read in the Huffman tree, we do a preorder traversal of the tree -- guided by the input file -- creating nodes as we go.

Decoding File

Once the tree has been built, decoding files is easy.  Start from the root of the tree, follow the appropriate child based on the next bit read in from the input file until a leaf is reached, and then print out the character stored at that leaf.

Command Line Arguments

Java allows the user to pass in command line arguments.  The input parameter to the main function is an array of strings.  If a Java main program has the prototype:

public static void main(String args[])

and the program is called with the command

% java MyProgram arg1 arg2 arg3

Then args.length == 3, args[0] = "arg1", args[1] = "arg2", and args[2] = "arg3".

Your program should expect to be called as follows:

% java Huffman (-c|-u) [-v] [-f]  infile outfile

where:
The flags -f and -v can be in either order.  So, the following would all be legal:

Verbose Output

If a file is compressed with the "-v" option, you should print the following to standard output (using System.out.print(ln)):


If a file is uncompressed with the "-v" option, you should print out following to standard output (using System.out.print(ln)):

Due Date

This project is due at 3:30 on Friday, March 20th.  The project may be turned in after Friday, but by Monday, March 23rd at 3:30 for 75% credit.  Projects turned in after 3:30 on March 28th will receive no credit.

Program Submission & Environment


You need to submit an electronic and a hardcopy version of your code.  To submit electronically, submit the file Huffman.java (as well as all other source files that your program needs to run, including
the provided files for file I/O) to the subversion repository:

https://www.cs.usfca.edu/svn/<username>/cs245/Project2

Put this subversion directory at the top of your printout, to make life easier on Ye.  You do not want the TA to be grumpy when he is grading your code!

You may develop your code in any environment that you like, but it needs to run under linux in the labs!  While I recommend developing under linux, you may develop in Windows if you prefer, as long as your program runs under linux.  To compile and run your program in linux, create a directory that contains all of the necessary .java files.  Then compile all the files with the command

% javac *.java
 
You can then run you program with the command:

% java Huffman -c <input file> <output file>

Collaboration

It is OK for you to discuss solutions to this program with your classmates.  However, no collaboration should ever involve looking at one of your classmate's source programs!  It is usually extremely easy to determine that
someone has copied a program, even when the individual doing the copying has changed identifier names and comments.

Supporing Files