Computer Science 245
Spring 2009
Project 3:  Sorting out Sorting
Due Friday, April 17th, 3:30 p.m.


For your third project, you will implement a number of sorting algorithms, and then test their performance.  This project will consist of not just coding, but also testing your code using large data sets.  Note that when all of your code is complete and debugged, you still have a fair amount of work to do -- so start early!

Coding Sorting Algorithms

For Part 1 of the assignment, you will need to write 8 sorting algorithms: Bubble Sort, Insertion Sort, Shell Sort, Bucket Sort, Heap Sort, Quicksort, Merge Sort, and Radix sort. All of these sorting algorithms should be able to sort the input array in either ascending or descending order. 

In addition to the above 8 algorithms, you will also need to write a version of both insertion sort and merge sort that sort linked lists, either in ascending or descending order.   Finally, you will use data drawn from test of sorting algorithms to create an optimized hybrid sort. Your sorting project will contain:
It is critically important that:
Otherwise, the grading program will not function correctly, and you will lose points!  Also, check to make sure your sorting algorithms are correct.  Several of these algorithms (most notably bucket sort) are complicated, and it is easy to make a subtle mistake when coding.  Be sure to read the detailed requirements for each sorting algorithm.  You should not use any instance variables for this assignment, only local method variables (though you can have final instance variables if you want symbolic constants).  You are of course allowed to write as many extra (private) helper functions as you would like.  

Efficiency Testing

After you have coded your algorithms, you need to test them, to see how long each sorting algorithm takes to run.  You should test each algorithm on both random and sorted lists of sizes  5-20, 50, 100, 200, 300, 400, 500, 1000, 2000, 3000, 4000, 5000, 10000, 50000, and 100000.  You can get a random list using the java class Random, located in java.util.Random  To test the speed of your sorting algorithms, you should use the System.currentTimeMillis() method, which returns a long that contains the current time (in milliseconds). Call System.currentTimeMillis() before and after the algorithm runs, and subtract the two times. Unfortunately,  using currentTimeMillis before and after a function only gives an accurate time estimate if the function takes a long time to run (that is, at least a couple of seconds).  Since running any sorting algorithm on a list of size 5
will take considerably less than a second, to test how long sorting algorithms take for small lists, you will need to do something like the following:

   long startTime, endTime;
   double duration;
   Random randomGenerator = new Random();
   Sort sorter = new Sort();

   startTime = System.currentTimeMillis();
   for(i=0;i<NUMITER;i++) {
      for (j=0; j< listsize; j++)
         list[j] = randomGenerator.nextInt();
         sorter.quickSort(list,0,listsize-1));
      }
   }
   endTime = System.currentTimeMillis();

   duration = ((double) (endTime - startTime)) / NUMITER;

You'll have to play around with different values for NUMITER -- when sorting a small list you want it to be quite large (100000 or larger is reasonable for very small lists), but when doing a Θ(n2) sort on a large list, it should be much smaller (on list of size 100000 you probably want a value for NUMITER that is close to 1 for insertion sort, for instance)

You might notice that there is some non-sorting work done in the above algorithm -- mainly, setting up the list before each sort can take place.  This work takes a small amount of time in comparison to the sorting, as you can easily check for yourself:

   startTime = System.currentTimeMillis();
   for(i=0;i<NUMITER;i++) {
      for (j=0; j< listsize; j++)
         list[j] = rand();
   }
   endTime = System.currentTimeMillis();
   duration = ((double) (endTime - startTime)) / NUMITER;

You should subtract this setup time from your algorithm running time, to get more accurate results.  Your main program may run in either interactive mode or batch mode (though you do not need to implement both modes, just the one that is easiest for you to use in testing.)

Building a Better Sorting Algorithm

After looking at the data from the previous section, you can see that when the lists get large, quicksort is clearly the fastest comparison sorting algorithm (hence the name). However, when the lists are small enough, quicksort runs slower that some of the Θ(n2) algorithms.  This might not seem important until you note that when sorting a large list with quicksort, many many small sublists must be sorted.  While the savings on sorting one small list with a faster algorithm is negligible, sorting hundreds of small lists with a faster algorithm can make a difference in the overall efficiency of the sort.  For part 3 of the assignment, you will combine quicksort with another sorting algorithm to build the fastest possible sorting algorithm.  You have several options -- 
What does ``small enough'' mean?  You can try a percentage of the list (say, 5% or 10%), or an absolute number (8 elements, 10 elements, 15 elements, etc), or something else of your choosing.  You can use the data from part 2 to help decide how to code the hybrid search, but you should also run tests to ensure that you have the most efficient algorithm possible.  For instance, the data from part 2 will give you a good idea about where the cutoff for the hybrid search should be, but you should test all the nearby cutoff values to ensure that you have the best one.  You should also be sure that your hybrid quicksort has reasonable performance on all lists -- most notably, it should be efficient on sorted and inverse sorted lists as well as random lists.  Try various methods for choosing the pivot element, to try to get the best possible behavior.

Sorting Algorithms in Detail

What to Turn In

You need to turn in hardcopies of:
In addition to hardcopies, you need to submit all required files to the subversion repository:

https://www.cs.usfca.edu/svn/<username>/cs245/Project3/

Put this subversion directory at the top of your printout, to make life easier on Ye.  You do not want the TA to be grumpy when he is grading your code!

Due Date

This project is due at 3:30 on Friday, April 17th.  The project may be turned in after Friday, but by Monday, April 20th at 3:30 for 75% credit.  Projects turned in after 3:30 on Monday, April 20th will receive no credit.

Collaboration

It is OK for you to discuss solutions to this program with your classmates.  However, no collaboration should ever involve looking at one of your classmate's source programs!  It is usually extremely easy to determine that someone has copied a program, even when the individual doing the copying has changed identifier names and comments.

Supporing Files