|
Department of Computer Science |
University of San Francisco |
Computer Science 220-01
Introduction to Parallel Computing
Fall 2011
MWF 11:45-12:50, KA 172
Professor: Peter Pacheco
Office: Harney 540
Phone: 422-6630
Email: domain: cs.usfca.edu, user: peter
Office Hours: M 4-5, W and F 10-11, and by appointment
TA: Shah El-Rahman
Email: domain: cs.usfca.edu, user: snelrahman
Office Hour: Tue and Thu 1:30-2:30 and 4:30-5:30 in HR 530
or HR 535
Class mailing list: The earlier instructions for joining
the list were incorrect. Please send your preferred
email address to the instructor, and he'll add you to the list.
Once you're a member of the list you can also post messages by sending
email to user cs220 in the domain cs.usfca.edu.
Course Syllabus (Here's a
PDF Version.)
Programming Assignments
- Programming Assignment 1.
Here's a solution.
- Programming Assignment 2.
Here's a solution.
- Programming Assignment 3. Here's
a serial program implementing
Floyd-Warshall. Here's a program
for generating matrices. Here's a
function for printing rows of
a matrix in an MPI program. Here's a
solution.
- Programming Assignment 4. Here's a
serial program for finding
primes. Here's an MPI
program for printing a list of ints as a string.
Here's a solution.
- Programming Assignment 5. Here's a
serial program that uses
recursion.
Here's a serial program that uses
its own stack. Here's
solution that uses static partitioning.
Here's
solution that uses dynamic partitioning.
Homework Assignments
- Homework Assignment 1. Here's
a solution.
- Homework Assignment 2. We started work on a merge
function in class. This code
shows where we left off. Complete the program by completing the
merge function and writing the two I/O functions. Due Friday,
September 9 at 11 am. Here's a solution.
- Homework Assignment 3. Implement the Member function in
the linked list
program. Input arguments are the head of the list and
the value to be searched for. The return value is zero if
the value is not in the list and nonzero if the value is in the
list. Be sure to modify the main function so that it will
call the Member function and print the results.
Here's a solution.
- Homework Assignment 4. Modify the Delete function
in the int linked list
program so that it deletes every occurrence of
val from the list. This is due on Friday, September 23.
Here's a solution.
- Homework Assignment 5. Write an MPI program that estimates
the area under a curve using Simpson's Rule. See
assignment 1, the solution
to assignment 1, and the MPI
trapezoidal rule program. You can assume that the number of
subintervals (n) divided by the number of processes (p) is even.
In addition to the estimate of the area, your program should
report the time it spent in Simpson's rule. This is due on
Friday, September 30.
Here's a solution.
- Homework Assignment 6. Modify the
program that computes a global sum using integer arithmetic so that
it will work with any number of processes -- not just a power of 2.
This is due on Friday, October 14.
Here's a solution.
- Homework Assignment 7. Write an MPI program that implements a
tree-structured broadcast function. Your program should get
an int from the user and then call your broadcast function.
After the broadcast function has completed, each process should
print the int it received in the broadcast. Note that this
is an exception to the rule that only process 0 should print
results. You can assume that the number of processes is
a power of two. This is due on Friday, October 21.
Here's a solution.
- Homework Assignment 8. DAXPY stands for "Double precision
Alpha X Plus Y." If x and y are n-dimensional arrays of doubles
and alpha is a double, then the code for a DAXPY is
for (i = 0; i < n; i++)
y[i] += alpha*x[i];
Write a Pthreads program that computes a DAXPY. The main thread should
read in n, allocate storage for x and y, and read in x, y and alpha.
When the threads have finished computing the DAXPY, the main thread
should print the result. You should use a block distribution of the
elements of x and y, and you can assume that n is evenly divisible
by thread_count. You can make x, y, alpha, and n global variables.
This is due on Friday, October 28. Here's a
solution.
- Homework Assignment 9. Write a Pthreads program that finds
the dot product of two user-input vectors. The main thread
should read in the order of the vectors and their
contents. It should then start thread functions, each of
which computes part of the dot product. The vectors, their
order, and the dot product should be stored in shared variables.
The main thread should print the result. Use a cyclic
partition of the vectors and busy-waiting to protect access
to the critical section. This is due on Friday, November 4.
- Homework Assignment 10. The program
many_mutexes.c repeatedly
locks and unlocks a mutex. Modify it so that it uses semaphores
instead of mutexes.
Run each program at least three times on a node of the penguin
cluster using 4 threads and n = 1,000,000. How do the
minimum
run times compare? (Note: the last time I checked unnamed
semaphores -- which we're using --
were not implemented on MacOS X. So you may need to develop
your semaphore program on a Linux system.) This is due on Friday,
November 11.
Here's a solution.
- Homework Assignment 11. Write an OpenMP program that estimates
the area under a curve using Simpson's Rule. See
assignment 1, the solution
to assignment 1, and the OpenMP
trapezoidal rule program. You can assume that the number of
subintervals (n) divided by the number of threads is even.
You don't need to time the code, but you can use
timer.h if you
want to.
This is due on Friday, December 2.
Here's a solution.
- Homework Assignment 12. Write an OpenMP program that
implements a dot product. Use the
serial dot product program as your starting point. You should
use a parallel for directive to parallelize the main
for loop. This is due on Wednesday, December 7.
Here's a solution.
Other Information
- Brief Introduction to Subversion
- A Very Brief Introduction to gdb
- Brief Introduction to Using the Penguin
Cluster.
- Some run-times for
the MPI trapezoidal rule
program.
- A list of topics for the first
midterm.
- A key to the first
midterm.
- Performance of two
implementations of shared memory matrix-vector
multiplication for various inputs
- A list of topics for the second
midterm.
- A key to the second
midterm.
- Performance of various
shared memory implementations of the trapezoidal rule
- List of topics covered since
the second midterm
Code Examples
- Trapezoidal rule implementations:
- Argument passing:
- Arrays:
- Strings:
- Linked lists:
- Basic MPI:
- Taking timings:
- Global sums:
- A global
sum program that uses MPI and modular arithmetic.
- A global
sum program that uses MPI and modular arithmetic.
This version prints debug information
- A global
sum program that uses MPI and modular arithmetic.
This version will work with any number of processes.
- Another global
sum program. This version uses bitwise operations.
- Another global
sum program. This version uses bitwise operations,
and works with any number of processes.
- Another global
sum program. This version returns the sum on all
the processes using a butterfly.
- Another global
sum program. This version returns the sum on all
the processes using a ring-pass.
- Linear algebra:
- A serial program
for finding the dot product of two vectors.
- An MPI program
for finding the dot product of two vectors. Only process 0
returns the dot product.
- An MPI program
for finding the dot product of two vectors. This version returns
the dot product on all the processes.
- A serial program
for finding a matrix-vector product.
- An MPI program
for finding a matrix-vector product. The matrix has a block-row
distribution and the vectors have block distributions.
- Sorting:
- Basic Pthreads:
- Producer-consumer synchronizations:
- Pthreads matrix vector multiplication:
- Implementing barriers in Pthreads:
- Threadsafety:
- Multithreaded linked lists:
- Basic OpenMP:
- OpenMP Trapezoidal Rule
- Loops in OpenMP
- A Simple Sorting Algorithm for Shared Memory
Peter Pacheco
2011-12-07