Sorting (Due April 19th, 2017)

For your third project, you will implement a number of sorting algorithms, and then test their performance. This project will consist of not just coding, but also testing your code using large data sets. Note that when all of your code is complete and debugged, you still have a fair amount of work to do -- so start early!

Coding Sorting Algorithms

For Part 1 of the assignment, you will need to write 12 sorting algorithms:

6 Sorting algorithms that sort Arrays of Comparables
- insertionSort
- selectionSort
- shellSort
- heapSort
- quickSort
- optimizedQuickSort
4 Sorting algorithms that sort Linked Lists of Comparables
- insertionSortLL
- selectionSortLL
- mergeSortLL
- quickSortLL
1 Sorting algorithms that sorts an Array of ints
- bucketSort
1 Sorting algorithms that sorts an Array of Strings
- radixSort

Your sorting project will contain:

Sorting class containing static methods (skeleton provided)

Linked List element class (provided)
SortTest class which contains your main program (You need to create this)

It is critically important that:

Your sorting class be named Sort
Your do not change any of the function signatures of any of the provided funtion stubs in the Sort class

Otherwise, the grading program will not function correctly, and you will lose points! Also, check to make sure your sorting algorithms are correct. Several of these algorithms (most notably bucket sort and String-based radix sort) are complicated, and it is easy to make a subtle mistake when coding. Be sure to read the detailed requirements for each sorting algorithm. .

Efficiency Testing

After you have coded your algorithms, you need to test them, to see how long each sorting algorithm takes to run. You should test each algorithm on both random and sorted lists of sizes 1000, 5000, 10000, 50000, 75000, 100000 and 500000. You can get a random list using the java class Random, located in java.util.Random To test the speed of your sorting algorithms, you should use the System.currentTimeMillis() method, which returns a long that contains the current time (in milliseconds). Call System.currentTimeMillis() before and after the algorithm runs, and subtract the two times. Unfortunately, using currentTimeMillis before and after a function only gives an accurate time estimate if the function takes a long time to run (that is, at least a couple of seconds). Since running some sorting algorithm on a list of size 1000 will take a very short time, you will need to do something like the following:

long startTime, endTime;
double duration;

Random randomGenerator = new Random();
Sort sorter = new Sort();
startTime = System.currentTimeMillis();
for(i=0;i<NUMITER;i++) 
{
   for (j=0; j< listsize; j++)
       list[j] = randomGenerator.nextInt();

   sorter.quickSort(list,0,listsize-1));
}
endTime = System.currentTimeMillis();
duration = ((double) (endTime - startTime)) / NUMITER;

You'll have to play around with different values for NUMITER -- it will need to change depending upon the size of the list and the algorithm.
You might notice that there is some non-sorting work done in the above algorithm -- mainly, setting up the list before each sort can take place. This work takes a small amount of time in comparison to the sorting, as you can easily check for yourself:

startTime = System.currentTimeMillis();
for(i=0;i<NUMITER;i++) 
{
    for (j=0; j< listsize; j++)
        list[j] = rand();
}
endTime = System.currentTimeMillis();
duration = ((double) (endTime - startTime)) / NUMITER;

You should subtract this setup time from your algorithm running time, to get more accurate results. Your main program may run in either interactive mode or batch mode (though you do not need to implement both modes, just the one that is easiest for you to use in testing.)

Building a Better Sorting Algorithm

When the list get large, quicksort is clearly the fastest comparison sorting algorithm (hence the name). However, when the lists are small enough, quicksort runs slower that some of the Θ(n²) algorithms. This might not seem important until you note that when sorting a large list with quicksort, many many small sublists must be sorted. While the savings on sorting one small list with a faster algorithm is negligible, sorting hundreds of small lists with a faster algorithm can make a difference in the overall efficiency of the sort. For part 3 of the assignment, you will combine quicksort with another sorting algorithm to build the fastest possible sorting algorithm. You have several options --

Use quicksort until the list gets small enough, and then use another sort or insertion sort to sort the small lists
Use quicksort to "mostly" sort the list. That is, use quicksort to sort the list until a cutoff size is reached, and then stop. The list will now be mostly sorted, and you can use insertion sort on the entire list to quickly complete the sorting (not unlike the strategy used in Shell Sort)
Try to make partition (where the work happens!) as efficient as possible
Write a version of quicksort that is (partially) tail-recursive
Some other method of your own devising.

What does ``small enough'' mean? You can try a percentage of the list (say, 5% or 10%), or an absolute number (8 elements, 10 elements, 15 elements, 100 elements, etc), or something else of your choosing. Your tests should ensure that you have the most efficient algorithm possible. You should also be sure that your hybrid quicksort has reasonable performance on all lists -- most notably, it should be efficient on sorted and inverse sorted lists as well as random lists. Try various methods for choosing the pivot element, to try to get the best possible behavior.

Sorting Algorithms in Detail

static <T extends Comparable<T>> void insertionSort(T[] array, int lowIndex, int highIndex, boolean reversed)

This is the most straightforward of the sorting algorithms to code - there are only two wrinkles -- your insertion sort needs to work over a range of indices in the array, just like quickSort, and you need to be able to sort the list backwards, if the reversed flag is true. Your algorithm should sort all elements in the array in the range lowindex..highindex (inclusive). You should not touch any of the data elements outside the range lowindex .. highindex. Note that you need to be able to sort any Comparable object.

public public static <T extends Comparable<T>> void selectionSort(T[] A, int lowIndex, int highIndex, boolean reversed)

Also very straightforward. As with insertion sort above, your sorting algorithm needs to work over a range of indicies in the array, and you need to be able to sort the list backwards, if the reversed flag is true. Your algorithm should sort all elements in the array in the range lowindex..highindex (inclusive). You should not touch any of the data elements outside the range lowindex .. highindex.

public public static <T extends Comparable<T>> void shellSort(T[] array, int lowindex, int highindex, boolean reversed)

Your implementation of Shell Sort needs to use Hibbard's increments: 1, 3, 7, 15, ... 2^k-1. Thus if the range of elements contains 100 elements, the first sort would be a 63-sort, followed by a 31-sort, 15-sort, 7-sort, 3-sort and 1-sort. (The code in the notes uses Shell's increments - in this case 50, 25, 12, 6, 3, 1). As with insertion sort, you need to be able to sort only a range of the array, and also be able to inverse-sort the array. This function and insertionSort should share code!

public static <T extends Comparable<T>> void heapSort(T[] array, int lowindex, int highindex, boolean reversed)

As with insertion sort, you need to sort a range of indices. Do not copy the range to be sorted into a temporary array, sort it, and then copy back -- you need to sort the data in place. (Perhaps you should consider parent / child functions parameterzied based on lowindex ...) You also need to be able to inverse sort the list.

public public <T extends Comparable<T>> void quicksort(T[] array, int lowindex, int highindx, boolean reversed)

This is the standard, unmodified version of quicksort. You should use a median-of-three to pick the pivot. That is, pick three elements (first, middle, and last, or three random) and use the median of those 3 elements as the pivot. This version of quicksort should not be a hybrid. Note that you will need to do some special-case work on small lists (since obviously you cannot find the median of three on a list with 2 elements)

public static > void optimizedQuickSort(T[] array, int lowindex, int highindex, boolean reversed)

This should be a hybrid of quicksort and some other sorting algorithm. Make it as efficient as possible.

public static <T extends Comparable<T>> LLNode<T> insertionSortLL(LLNode<T> list, boolean reversed)

This function uses Insertion Sort to sort a linked list. It should be called in a similar fashion to tree functions, as follows:

A = insertionSortLL(A, false);

Note that when using linked lists, you need to implement insertion sort in a slightly different way (for instance, an inverse sorted list will probably give best-case performance, while a sorted list will probably give you worst-case performance)
public static <T extends Comparable<T>> LLNode<T> selectionSortLL(LLNode<T> list, boolean reversed)

Much as above, this function sorts a linked-list using selection sort. You should not allocate any extra memory for this version of selection sort (no calls to new!) Instead, you need to rearrange the linked list elements that are passed in. (Alternately, you may wish to consider moving the data elements around and keeping the structure of the linked list the same)

public static <T extends Comparable<T>> LLNode<T> mergeSortLL(LLNode<T> list, boolean reversed)

Much as above, this function sorts a linked-list using merge sort. You should not allocate any extra memory for this version of merge sort (no calls to new!) Instead, you need to rearrange the linked list elements that are passed in.
public static <T extends Comparable<T>> LLNode<T> quickSortLL(LLNode<T> list, boolean reversed)
As with all the other linked list sorting algorithms, we will need to sort the list by moving the linked list nodes around and relinking them -- not by calling new! You can have no calls to new in this method (or in any methods that it calls!). It is easiest to have the pivot be the first element in the list. Partition should break the list into two sublists, call itself recursively, and then slice the lists together (which will, alas, take O(n) time. If you want to write helper methods that return a pointer ot both the head and tail of the sorted list to make splicing easier, you are welcome to, but that is not required)

public static void bucketSort(int[] array, int lowindex, int highindex, boolean reversed)

Your implementation of Bucket Sort should use half as many buckets are there are elements to be sorted. Thus, if highindex - lowindex + 1 == 100, you should use 50 buckets. Assume that the data values are evenly distributed over the range of the list. You will need to do a quick run through the list to find the range of values stored in the list. You need to be able to handle sorting negative as well as positive values. While the list that you are sorting will be ints, you might need to use longs in some places when calculating bucket size. As before, you need to be able to inverse-sort the list. Note that you are sorting ints here and not Comparables -- bucketSort is not a comparison-based sorting algorithm!
public void radixSort(String[] array, int lowindex, int highindex, boolean reversed)

As with all of the other sorting algorithms, radix sort needs to be able to sort elements in the range lowindex to highindex. All version of radix sort which we have seen so far sort lists of integers, not strings, but radix sort can easily be extended to sort strings (since you can think of strings as having a "most significant digit (character)", a "second most signficant digit (character)", and so on. The only wrinkle is for strings that are different in length -- the most significant character is always the first one, and shorter strings have fewer "digits". We can get around this by first sorting the strings by their length (using a counting sort!), then running a counting sort on just the least significant characters of the longest strings, continuing until all strings are sorted. Let's look at an example. Say we are soring the list of strings [ "BABAB", "BA", "CB", "BAABB", "CCCAA", "C" ]. First, we sort the strings by length, giving us:
```
"C"
"BA"
"CB"
"BABAB"
"BAABB"
"CCCAA"
```
The first 3 passes of our counting sort only look at the strings of length 5. Once that round is done, we have:
```
"C"
"BA"
"CB"
"BAABB"
"BABAB"
"CCCAA"
```
Now we can add back in the strings of length 2 -- the next pass sorts strings of lengths 2 - 5, based on the second charater:
```
"C"
"BA"
"BAABB"
"BABAB"
"CB"
"CCCAA"
```
Finally, we sort all the strings using the first charater, to get:
```
"BA"
"BAABB"
"BABAB"
"C"
"CB"
"CCCAA"
```

What to Turn In

You need to submit to the subversion repository:

Your copy of Sort.java
Source code for your main program, which you used for performance testing
Running time for all 12 algorithms for lists of sizes 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 50000 and 100000.;Be sure to subtract out the overhead time costs! You should include sorted and inverse sorted lists as well as random lists for each list size in these tests. The results should not be handwritten! Use your favorite word processor / spreadsheet / etc instead, and subit the results as a .pdf file so we can read it easily
A brief (one page is enough) document on how you created your hybrid sorting algorithm -- which apporoaches you tried, what different parameters you chose, and how much of a difference these changes made to the efficiency of your algorithm. This document should be either plaintext or .pdf

In addition to hardcopies, you need to submit all required files to the subversion repository:

https://www.cs.usfca.edu/svn/<username>/cs245/project3/

Put this subversion directory at the top of your printout, to make life easier on the TA. You do not want the TA to be grumpy when he is grading your code!

Due Date

This project is due midnight on Wednesday, April 19th.

Collaboration

It is OK for you to discuss solutions to this program with your classmates. However, no collaboration should ever involve looking at one of your classmate's source programs! It is usually extremely easy to determine that someone has copied a program, even when the individual doing the copying has changed identifier names and comments. Also, DO NOT copy / paste ANY code from any online resource. Start with the provided Sort.java file, and using just your knowledge of the sorting algorithms, write the code.

Computer Science 245: Data Structures and Algorithms