|
Department of Computer Science |
University of San Francisco |
Computer Science 625
Possible Projects
You should speak to me or send me email regarding your proposed
project by Monday, April 18.
Projects can be team efforts. However, no one project can have
more than two students working on it.
Duplicate projects won't be allowed, and
projects will be assigned on a first-come, first-served
basis. The first student (or team of students) to propose
a project will be assigned the project. So let me know
about your project as soon as possible.
Here's a list of ideas for possible projects. You don't have to choose
a project from this list, but you must get your project
approved.
- Implement distributed memory preconditioned conjugate gradients
for sparse matrices. Adrian.
- Implement Gaussian elimination with submatrix partitioning and MPI.
Compare pipelined and non-pipelined implementations.
Compare block, cyclic, and block-cyclic distributions.
Compare your solver with the ScaLAPACK
solver. Srujana.
- Implement a distributed memory preconditioned GMRES
solver for sparse linear
systems.
Compare your solver with the PETSc GMRES solver.
- Implement serial and distributed-memory parallel versions
of Strassen's algorithm for matrix multiplication. Compare
the performance of your parallel implementation to parallel
matrix multiplication available in ScaLAPACK. Yumeng and Ludan.
- Implement WaTor using MPI and dynamic load balancing. Discuss
performance. Krichaporn and Pakkapon.
- Write a distributed memory parallel program for repartitioning
a distributed graph so that the weight of the edge-set that's cut
is minimized. Include code for redistributing the graph. Compare
the performance of distributed sparse matrix-vector multiplication
before and after the redistribution. Include the cost of the
redistribution.
- Parallel sorting. Implement a variety of distributed memory
parallel sorting algorithms. Discuss their relative performance.
Puneet and Bashar.
- Write an MPI program that uses asynchronous iteration with conjugate
gradients to solve sparse systems of equations. Compare the
performance of your solver to a ``conventional'' CG solver.
Xintian and Lin.
- Explore latency and bandwidth of shared memory communication between
threads when the threads are assigned to different cores of the same
processor and when the threads are assigned to cores on different
processors. How does the performance of AMD systems
(grolsch, penguin,chimay)
compare to the performance of Intel systems (spaten, stella)? How does
the addition of communicating threads affect performance?
Kai and Chao.
- Install one or more of the software transactional memory systems
on grolsch and spaten or stella. Implement one or more of the dwarves
with the transactional memory software, Pthreads, and OpenMP. How does
the performance of the systems compare? Neal and Leo.
- Implement a significant parallel algorithm on the cluster using OpenMP
for intranode communication and MPI for internode communication. Also
implement the algorithm using only MPI. How does the performance of the
mixed implementation compare with the MPI-only implementation? How did the
difficulty of implementation of the two systems compare? Shan and Bobby.
- Implement LU factorization and back substitution on a GPU. How does
its performance compare to a serial direct solver? Simao and Chen.
- Implement a sorting algorithm using the GPU and using Pthreads
or OpenMP. How does their performance compare to a good serial
sorting algorithm? Calvin and Felix.
Peter Pacheco
2011-04-26