For your third project, you will implement a number of sorting
algorithms, and then test their performance. This project
will
consist of not just coding, but also testing your code using large
data sets. Note that when all of your code is complete
and debugged, you
still have a fair amount of work to do -- so start early!
Coding Sorting Algorithms
For Part 1 of the assignment, you will need to write 8 sorting
algorithms: Bubble Sort, Insertion Sort, Shell Sort, Bucket Sort, Heap
Sort, Quicksort, Merge Sort, and Radix sort. All of these sorting
algorithms should be able to sort the input array in either ascending
or descending order.
In addition to the above 8 algorithms, you will also need to write a
version of both insertion sort and merge sort that sort linked lists,
either in ascending or descending order. Finally, you will use
data drawn from test of sorting algorithms to create an optimized
hybrid sort. Your sorting project
will contain:
It is critically important that:
- Your sorting class be named Sort
- Your Sort class has a constructor that takes no parameters
- Your Sort class implements the interface as is,
without changes.
Otherwise, the grading program will not function correctly, and you will lose
points! Also, check to make sure your sorting algorithms are
correct. Several of these algorithms (most notably bucket
sort) are complicated, and it is easy to make a subtle mistake when coding.
Be sure to
read the detailed requirements for each sorting algorithm.
You should not use any instance variables for this
assignment,
only local method variables (though you can have final instance
variables if you want symbolic constants). You are of course
allowed to write as many extra (private) helper functions as you would
like.
Efficiency Testing
After you have coded your algorithms, you need to test them, to see how
long each sorting algorithm takes to run. You should test
each
algorithm on both random and sorted lists of sizes 5-20, 50,
100,
200, 300, 400, 500, 1000, 2000, 3000, 4000, 5000, 10000, 50000, and
100000. You can get a random list using the java class
Random,
located in java.util.Random To test the speed of your sorting
algorithms,
you should use the System.currentTimeMillis() method, which returns a
long that contains the current time (in milliseconds). Call
System.currentTimeMillis() before and after the algorithm runs, and
subtract the two times. Unfortunately, using
currentTimeMillis
before and after a function only gives an accurate time estimate if the
function takes a long time to run (that is, at least a couple of
seconds). Since running any sorting algorithm on a list of
size 5
will take considerably less than a second, to test how long sorting
algorithms take for small lists, you will need to do something like the
following:
long startTime, endTime;
double duration;
Random randomGenerator = new Random();
Sort sorter = new Sort();
startTime = System.currentTimeMillis();
for(i=0;i<NUMITER;i++) {
for (j=0; j< listsize; j++)
list[j] = randomGenerator.nextInt();
sorter.quickSort(list,0,listsize-1));
}
}
endTime = System.currentTimeMillis();
duration = ((double) (endTime - startTime)) / NUMITER;
You'll have to play around with different values for NUMITER -- when
sorting a small list you want it to be quite large (100000 or larger is
reasonable for very small lists), but when doing a Θ(n
2)
sort on a large list, it should be much smaller (on list of size 100000
you probably want a value for NUMITER that is close to 1 for insertion
sort, for instance)
You might notice that there is some non-sorting work done in the above
algorithm -- mainly, setting up the list before each sort can take
place. This work takes a small amount of time in comparison
to
the sorting, as you can easily check for yourself:
startTime = System.currentTimeMillis();
for(i=0;i<NUMITER;i++) {
for (j=0; j< listsize; j++)
list[j] = rand();
}
endTime = System.currentTimeMillis();
duration = ((double) (endTime - startTime)) / NUMITER;
You should subtract this setup time from your algorithm running time,
to get more accurate results. Your main program may run in
either
interactive mode or batch mode (though you do not need to implement
both modes, just the one that is easiest for you to use in testing.)
Building a Better Sorting Algorithm
After looking at the data from the previous section, you can see that
when the lists get large, quicksort is clearly the fastest
comparison
sorting algorithm (hence the name). However, when the lists
are
small enough, quicksort runs slower that some of
the Θ(n
2)
algorithms. This might not seem important until you note that
when sorting a large list with quicksort, many
many
small sublists must be sorted. While the savings on sorting
one
small list with a faster algorithm is negligible, sorting hundreds of
small lists with a faster algorithm can make a difference in the
overall efficiency of the sort. For part 3 of the assignment,
you
will combine quicksort with another sorting algorithm to build the
fastest possible sorting algorithm. You have several options
--
- Use quicksort until the list gets small enough, and then
use another sort or insertion sort to sort the small lists
- Use quicksort to "mostly" sort the list. That is,
use
quicksort to sort the list until a cutoff size is reached, and then
stop. The list will now be mostly sorted, and you can use
insertion sort on the entire list to quickly complete the sorting (not
unlike the strategy used in Shell Sort)
- Use some other method of your own devising.
What does ``small enough'' mean? You can try a percentage of
the
list (say, 5% or 10%), or an absolute number (8 elements, 10
elements, 15 elements, etc), or something else of your choosing.
You can use the data from part 2 to help decide how to code
the
hybrid search, but you should also run tests to ensure that you have
the most efficient algorithm possible. For instance, the data
from part 2 will give you a good idea about where the cutoff for the
hybrid search should be, but you should test all the nearby cutoff
values to ensure that you have the best one. You should also
be
sure that your hybrid quicksort has reasonable performance on all lists
-- most notably, it should be efficient on sorted and inverse sorted
lists as well as random lists. Try various methods for
choosing
the pivot element, to try to get the best possible behavior.
Sorting Algorithms in Detail
- public
void insertionSort(int[] array, int lowindex, int highindex, boolean
reversed)
This is the most straightforward of the sorting algorithms to code -
there are only two wrinkles -- your insertion sort needs to work over a
range of indices in the array, just like quicksort, and you need to be
able to sort the list backwards, if the reversed flag is
true.
Your algorithm should sort all elements in the array in the range
lowindex..highindex (inclusive). You should not touch any of
the
data elements outside the range lowindex .. highindex.
- public
void bubbleSort(int[] array, int lowindex, int highindex, boolean
reversed)
Also very straightforward. As with insertion sort above, your
sorting algorithm needs to work over a range of indicies in the
array, and you need to be able
to sort the list backwards, if the reversed flag is true.
Your
algorithm should sort all elements in the array in the range
lowindex..highindex (inclusive). You should not touch any of
the
data
elements outside the range lowindex .. highindex.
- public
void shellSort(int[] array, int lowindex, int highindex, boolean
reversed)
Your implementation of Shell Sort needs to use Hibbard's increments
- 1, 3, 7, 15, ... 2k-1.
Thus if the range of elements contains 100 elements, the first sort
would be a 63-sort, followed by a 31-sort, 15-sort, 7-sort, 3-sort and
1-sort. (The code in the notes uses Shell's increments - in
this
case 50, 25, 12, 6, 3, 1). As with insertion sort, you need
to be
able to sort only a range of the array, and also be able to
inverse-sort the array.
- public
void bucketSort(int[] array, int lowindex, int highindex, boolean
reversed)
Your implementation of Bucket Sort should use half as many buckets are
there are elements to be sorted. Thus, if highindex -
lowindex +
1 == 100, you should use 50 buckets. Assume that the data
values
are evenly distributed over the range of the list. You
will
need to do a quick run through the list to find the range of values
stored in the list. You need to be able to handle sorting
negative as well as positive values. While the list that you are
sorting will be ints, you might need to use longs in some places when
calculating bucket size. As before, you need to be
able to
inverse-sort the list.
- public
void heapSort(int[] array, int lowindex, int highindex, boolean
reversed)
As with insertion sort, you need to sort a range of indices. Do not
copy the range to be sorted into a temporary array, sort it, and then
copy back -- you need to sort the data in place. You also
need to
be able to inverse sort the list.
- public
void quickSort(int[] array, int lowindex, int highindex, boolean
reversed)
This is the standard, unmodified version of quicksort. You
should
use a median-of-three to pick the pivot. That is, pick three
elements (first, middle, and last, for instance) and use the median of
those 3 elements as the pivot. This version of quicksort
should
not be a hybrid. Note that you will need to do some
special-case
work on small lists (since obviously you cannot find the median of
three on a list with 2 elements)
- public
void mergeSort(int[] array, int lowindex, int highindex, boolean
reversed)
Note that this function only takes 3 parameters, so you will probably
want to use a private version of mergeSort that takes an additional
parameter, and call that method from this mergeSort.
Allocating
memory over and over again in the recursive calls is a bad idea.
- public
void radixSort(int[] array, int lowindex, int highindex, boolean
reversed)
As with all of the other sorting algorithms, radix sort needs to be
able to sort elements in the range lowindex to highindex. The
notes use base-10 for radix sort -- for this assignment, you should use
base-n, where n is the number of elements in the list to be
sorted
(that is, highindex - lowindex + 1)
- public
LLNode insertionSortLL(LLNode list, boolean reversed)
This function uses Insertion Sort to sort a linked list. It
should be called in a similar fashion to tree functions, as follows:
A =
insertionSortLL(A, false);
Note that when using linked lists, you need to implement insertion sort
in a slightly different way (for instance, an inverse sorted list
will probably give best-case performance, while a sorted list will
probably give you worst-case performance)
- public
LLNode mergeSortLL(LLNode list, boolean reversed)
Much as above, this function sorts a linked-list using merge sort. You
should not
allocate any extra memory for this version of merge sort (no calls to
new!) Instead, you need to rearrange the linked list elements
that are passed in.
- void
optimizedQuickSort(int array[], int lowindex, int highindex, boolean
reversed)
This should be a hybrid of quicksort and some other sorting algorithm.
Make it as efficient as possible.
What to Turn In
You need to turn in hardcopies of:
- Source code for all of your sorting algorithms
- Source code for your main program, which you used for the performance testing
- Running time for all 11 algorithms (8 standard, 2 linked-list
versions, 1 optimized quicksort) for lists of size 5-20, 50, 100, 200,
500, 1000, 2000, 5000, 10000, and 50000. Be sure to subtract out
the overhead time costs! You should include sorted and inverse
sorted lists as well as random lists for each list size in these tests.
The results should not be handwritten! Use your favorite
word processor / spreadsheet / etc instead.
- A brief (one page is enough) document on how you created your
hybrid sorting algorithm -- which apporoaches you tride, what different
parameters you chose, and how much of a difference these changes made
to the efficiency of your algorithm.
In addition to hardcopies, you need to submit all required files to the subversion repository:
https://www.cs.usfca.edu/svn/<username>/cs245/Project3/
Put this subversion directory at the top of your printout, to make life
easier on Ye. You do not want the TA to be grumpy when he is
grading your code!
Due Date
This project is due at 3:30 on Friday, April 17th. The
project
may be turned in after Friday, but by Monday, April 20th at 3:30 for
75% credit. Projects turned in after 3:30 on Monday, April 20th will
receive no credit.
Collaboration
It is OK for you to discuss solutions to this program with your
classmates. However, no collaboration should
ever involve
looking at one of your classmate's source programs! It is
usually extremely easy to determine that someone has copied a program, even when the individual doing the
copying has changed identifier names and comments.
Supporing Files