|
Department of Computer Science |
University of San Francisco |
Computer Science 220-01
Introduction to Parallel Computing
Fall 2007
MWF 11:35-12:40, HR 235
Professor: Peter Pacheco
Office: Harney 540
Phone: 422-6630
Email: domain: cs.usfca.edu, user: peter
Office Hours: MWF 3-4 and by appointment
Office Hours During Exams: R 12/6 4-5, T 12/11 1-2
TA: Ellard Li
Email: domain: usfca.edu, user: emli
Office Hour: R 3:30-4:30
Class mailing list: To sign up click
here
and type in your preferred email address. The mailing list
will then send you an email asking you to confirm that you
want to join the list. In order to join, you must reply.
To post a message to the list, send email to user cs220
in domain cs.usfca.edu
Course Syllabus
Homework Assignments
- Homework Assignment 1. Due Friday,
August 31. Here's a key.
- For Homework Assignment 2, you should modify the C
trapezoidal rule program so that it
uses a function to estimate the integral. Input arguments
to the function
should be the endpoints a and b, and the
number of trapezoids n. (Note that these are
input arguments; the values should be read in
in main, as they are in the current program.)
The return value of the function should
be the estimate of the integral. This is due Friday, September 7,
at 5 pm. Here's a solution.
- For Homework Assignment 3, you should modify the
C merge program so that it merges
two lists that have been sorted in decreasing
order and the output list is also sorted in decreasing
order. Note that you must merge the two lists:
concatenating and sorting is unacceptable. This is due
Friday, September 14. Here's a solution.
- For Homework Assignment 4, you should define a function that
swaps two pointers. Here's a driver
program that includes sample input and output.
- For Homework Assignment 5, you should modify the
MPI hello program so
that each process q sends a message to process
(q+1) % p. So, for example, if there are three
processes, 0 sends to 1, 1 sends to 2, and 2 sends to 0.
The message should be something like "Process x sends
greetings to process y!", where x and y are the appropriate
process ranks. Each process should print
out the message that it receives. You should treat the
case p = 1 as a special case and not send
any messages, just print a "message" from process 0
to itself. Here's a solution.
- For homework assignment 6, modify the
global sum program so
that a process is paired first with the process that
differs from its rank in the highest order bit, then
in the next highest order bit, etc.
For example, with 8 processes,
the processes would be paired as follows:
- First stage: (0,4), (1,5), (2,6), (3,7)
- Second stage: (0,2), (1,3)
- Third Stage: (0,1)
You should use
the same basic scheme of exclusive-or'ing process ranks with
a bitmask. You can assume that p is a power of 2.
This is due on Friday, October 12.
Here's a solution.
- Homework assignment 7:
This program uses a random
number generator to generate a matrix that's distributed
by block rows.
Write a Print_matrix function that will gather
the matrix onto process 0 and have process 0 print it
out. You must use MPI_Gather to gather the matrix
onto process 0. Here's a solution.
- For assignment 8, modify each of the serial sorting programs
serial_bubble.c
serial_ins_sort.c
serial_odd_even.c, and
serial_sel_sort.c, so that they
print out the elapsed wall clock time used in the sort. You can
use the get_time macro defined in the header
file timer.h to find the current time.
Make up a table showing the performance of each of the four
sorting methods for input sizes of 1000, 10,000, 100,000, and
1,000,000. Which of the four methods seems to be best?
You should turn in your table of timings and your evaluation
of their performance on Friday.
- Assignment 9 has three parts.
- This program attempts to
send a "message" from each thread to thread 0. Thread 0 prints
out the messages. What happens when you run this program with
from 1 to 8 threads on a node of the cluster?
- Modify the message-passing Pthreads
program so that instead of sending messages to thread 0,
each thread q sends its message to thread (q+1) % thread_count
and the receiving thread prints the message it received.
- What happens when you run your program with from 1 to 8 threads
on a node of the cluster?
Here's a solution to the second part
- For assignment 10, you should modify the
pthreads matrix-vector multiplication program
so that it uses a cyclic distribution of the rows of
A and the entries in y. When you have
the modifications working, add in calls to
the timer macro before and after the multiplication
(as in pth_mat_vect_rand_opt.c).
How does the performance of the cyclic distribution version compare
to the block distribution? Here's a
solution. The cyclic distribution seems to perform about as well
as the block distribution if the matrix is square (m = n). If m is
much greater than n, the cyclic distribution is slower. I'm not sure
what's going on if m is much less than n.
- For assignment 11, you should get an A.
- For assignment 12, modify the threaded
linked list program so that it uses read-write locks. You'll
need rwlock1.c,
rwlock1.h,
my_rand.c, and
my_rand.h. It's probably a good idea to
lock the rwlock before the call to the linked list function,
and unlock it after the call to the Add_op function.
Here's a solution.
Programming Assignments
- Programming Assignment 1. This is
due on Wednesday, September 12. Note that the due date has
been changed. Here's a solution.
- Programming Assignment 2. This is
due on Wednesday, October 3. Note that the due date has
been changed. Here's a partially
written program that you
can use as a starting point. Here's a
solution.
- Programming Assignment 3. This is
due on Monday, October 22.
Here's a solution.
- Programming Assignment 4. This is
due on Monday, November 12. Here's some code that may be use
ful:
Here's a solution
- Programming Assignment 5. This is
due on Monday, December 3. Here's some code that may be useful:
Here's a solution that uses static
partitioning, and here's a solution
that uses dynamic partitioning
Other Information
- Java implementation of the trapezoidal rule.
- C implementation of the trapezoidal rule.
- Two examples of argument passing:
- Array merge program.
- Word search program.
- Some short programs that use (or try to use) structs.
- A broken linked list program.
- A linked list program which is believed
to be OK, but . . .
- Professor Benson's
GDB tutorial
- A brief guide to using the Penguin cluster.
- A hello world program that uses MPI.
- An implementation of the trapezoidal rule
that uses MPI.
- Another implementation of the trapezoidal
rule that uses MPI. This version allows user input.
- Another implementation of the trapezoidal
rule that uses MPI. This version prints runtimes, speedups
and efficiencies.
- A key to the first midterm
- An MPI program that computes a global sum
- An implementation of Floyd's algorithm
- A program for generating matrices that can be used
as input to the floyd program.
- A function for printing a row of the adjacency matrix as a string
- A serial program that computes a dot product
- An MPI program that computes a dot product
- An MPI program that computes a dot product
using MPI_Allreduce
- An MPI program that implements a tree-structured
broadcast
- An MPI program that implements a
butterfly-structured global sum
- A serial program that implements
matrix-vector multiplication
- An MPI program that implements
matrix-vector multiplication
- A serial bubble sort program
- A serial odd-even transposition
sort program
- An MPI implementation of odd-even sort
- A program that uses the qsort library
function
- A serial insertion sort program
- A serial selection sort program
- A header file containing a macro that
computes the current time.
- Run times for parallel odd-even sort
- A Pthreads hello, world program
- A Pthreads matrix-vector multiplication
program
- A serial program for estimating pi
- An attempt at a Pthreads program for estimating pi
- An attempt at a Pthreads program for
estimating pi that uses busy waiting
- A second attempt at a Pthreads program for
estimating pi that uses busy waiting
- A Pthreads program for
estimating pi that uses mutexes
- A Pthreads program for
estimating pi that uses semaphores
- A Pthreads program that sends
messages using semaphores
- A Pthreads matrix-vector multiplication
program slightly optimized
- A Table of Run Times and Efficiencies for
Matrix-Vector Multiplication (Thanks to Greg Brown for preparing this.)
- Two programs for implementing barriers:
this one uses busy-waiting and a mutex;
this one uses semaphores.
- Another program implementing barriers. This
one uses a condition variable.
- A Pthreads program that attempts to identify
words in text input.
- A Pthreads program that successfully
identifies words in text input.
- A key to the second midterm
- A broken linked list program that uses
Pthreads.
- A simple pseudo-random number generator and
a header file
- An implementation of read-write locks. This
is a stripped down version of the implementation in David Butenhof's,
Programming with POSIX Threads.
- A header file for the implementation of
read-write locks.
- A very simple OpenMP program.
- Various implementations of the trapezoid rule.
- Run times of the implementations of
the trapezoidal rule
- Some examples of loops that OpenMP won't parallelize.
- Some attempts to write sort functions using OpenMP
Peter Pacheco
2007-12-05