Department of Computer Science University of San Francisco

Computer Science 625
Possible Projects

You should speak to me or send me email regarding your proposed project by Wednesday, April 16.

Projects should be individual efforts.

Duplicate projects won't be allowed, and projects will be assigned on a first-come, first-served basis. The first student to propose a project will be assigned the project. So let me know about your project as soon as possible.

Here's a list of ideas for possible projects. You don't have to choose a project from this list, but you must get your project approved.

  1. Implement distributed memory preconditioned conjugate gradients for sparse matrices.
  2. Implement Gaussian elimination with submatrix partitioning and MPI. Compare pipelined and non-pipelined implementations. Compare block, cyclic, and block-cyclic distributions. Compare your solver with the ScaLAPACK solver.
  3. Implement a distributed memory preconditioned GMRES solver for sparse linear systems. Compare your solver with the PETSc GMRES solver.
  4. Implement serial and distributed-memory parallel versions of Strassen's algorithm for matrix multiplication. Compare the performance of your parallel implementation to the parallel matrix multiplication available in ScaLAPACK. Minglu.
  5. Implement WaTor using MPI and dynamic load balancing. Discuss performance. Bin.
  6. Write a distributed memory parallel program for repartitioning a distributed graph so that the weight of the edge-set that's cut is minimized. Include code for redistributing the graph. Compare the performance of distributed sparse matrix-vector multiplication before and after the redistribution. Include the cost of the redistribution. Hao (Luby), Joseph (Spectral Bisection)
  7. Parallel sorting. Implement 3-4 distributed memory parallel sorting algorithms. Discuss their relative performance. Xiaoou.
  8. Write an MPI program that uses asynchronous iteration with conjugate gradients to solve sparse systems of equations. Compare the performance of your solver to a ``conventional'' CG solver.
  9. Explore latency and bandwidth of shared memory communication between threads when the threads are assigned to different cores of the same processor and when the threads are assigned to cores on different processors. How does the performance of AMD systems (grolsch, penguin, chimay) compare to the performance of Intel systems (spaten, stella)? How does increasing communication among threads affect the performance of several shared memory programs? Dustin
  10. Install one or more of the software transactional memory systems on grolsch, chimay, and spaten or stella. Implement a parallel sorting algorithm with the transactional memory software, Pthreads, and OpenMP. How does the performance of the systems compare? Guangzhi
  11. Implement a significant parallel algorithm (e.g., sorting, solving linear systems) on the penguin cluster using OpenMP for intranode communication and MPI for internode communication. Also implement the algorithm using only MPI. How does the performance of the mixed implementation compare with the MPI-only implementation? How did the difficulty of implementation of the two systems compare? Pirakorn.
  12. Implement LU factorization and back substitution on a GPU. How does its performance compare to a serial direct solver?
  13. Implement a sorting algorithm using the GPU and using Pthreads or OpenMP. How does their performance compare to a good serial sorting algorithm? Robin.



Peter Pacheco
2014-04-17