Department of Computer Science University of San Francisco


Computer Science 220-01
Introduction to Parallel Computing
Fall 2007

MWF 11:35-12:40, HR 235


Professor: Peter Pacheco
Office: Harney 540
Phone: 422-6630
Email: domain: cs.usfca.edu, user: peter
Office Hours: MWF 3-4 and by appointment
Office Hours During Exams: R 12/6 4-5, T 12/11 1-2

TA: Ellard Li
Email: domain: usfca.edu, user: emli
Office Hour: R 3:30-4:30

Class mailing list: To sign up click here and type in your preferred email address. The mailing list will then send you an email asking you to confirm that you want to join the list. In order to join, you must reply. To post a message to the list, send email to user cs220 in domain cs.usfca.edu

Course Syllabus


Homework Assignments

  1. Homework Assignment 1. Due Friday, August 31. Here's a key.
  2. For Homework Assignment 2, you should modify the C trapezoidal rule program so that it uses a function to estimate the integral. Input arguments to the function should be the endpoints a and b, and the number of trapezoids n. (Note that these are input arguments; the values should be read in in main, as they are in the current program.) The return value of the function should be the estimate of the integral. This is due Friday, September 7, at 5 pm. Here's a solution.
  3. For Homework Assignment 3, you should modify the C merge program so that it merges two lists that have been sorted in decreasing order and the output list is also sorted in decreasing order. Note that you must merge the two lists: concatenating and sorting is unacceptable. This is due Friday, September 14. Here's a solution.
  4. For Homework Assignment 4, you should define a function that swaps two pointers. Here's a driver program that includes sample input and output.
  5. For Homework Assignment 5, you should modify the MPI hello program so that each process q sends a message to process (q+1) % p. So, for example, if there are three processes, 0 sends to 1, 1 sends to 2, and 2 sends to 0. The message should be something like "Process x sends greetings to process y!", where x and y are the appropriate process ranks. Each process should print out the message that it receives. You should treat the case p = 1 as a special case and not send any messages, just print a "message" from process 0 to itself. Here's a solution.
  6. For homework assignment 6, modify the global sum program so that a process is paired first with the process that differs from its rank in the highest order bit, then in the next highest order bit, etc. For example, with 8 processes, the processes would be paired as follows:
    1. First stage: (0,4), (1,5), (2,6), (3,7)
    2. Second stage: (0,2), (1,3)
    3. Third Stage: (0,1)
    You should use the same basic scheme of exclusive-or'ing process ranks with a bitmask. You can assume that p is a power of 2. This is due on Friday, October 12. Here's a solution.
  7. Homework assignment 7: This program uses a random number generator to generate a matrix that's distributed by block rows. Write a Print_matrix function that will gather the matrix onto process 0 and have process 0 print it out. You must use MPI_Gather to gather the matrix onto process 0. Here's a solution.
  8. For assignment 8, modify each of the serial sorting programs serial_bubble.c serial_ins_sort.c serial_odd_even.c, and serial_sel_sort.c, so that they print out the elapsed wall clock time used in the sort. You can use the get_time macro defined in the header file timer.h to find the current time. Make up a table showing the performance of each of the four sorting methods for input sizes of 1000, 10,000, 100,000, and 1,000,000. Which of the four methods seems to be best? You should turn in your table of timings and your evaluation of their performance on Friday.
  9. Assignment 9 has three parts. Here's a solution to the second part
  10. For assignment 10, you should modify the pthreads matrix-vector multiplication program so that it uses a cyclic distribution of the rows of A and the entries in y. When you have the modifications working, add in calls to the timer macro before and after the multiplication (as in pth_mat_vect_rand_opt.c). How does the performance of the cyclic distribution version compare to the block distribution? Here's a solution. The cyclic distribution seems to perform about as well as the block distribution if the matrix is square (m = n). If m is much greater than n, the cyclic distribution is slower. I'm not sure what's going on if m is much less than n.
  11. For assignment 11, you should get an A.
  12. For assignment 12, modify the threaded linked list program so that it uses read-write locks. You'll need rwlock1.c, rwlock1.h, my_rand.c, and my_rand.h. It's probably a good idea to lock the rwlock before the call to the linked list function, and unlock it after the call to the Add_op function. Here's a solution.

Programming Assignments
  1. Programming Assignment 1. This is due on Wednesday, September 12. Note that the due date has been changed. Here's a solution.
  2. Programming Assignment 2. This is due on Wednesday, October 3. Note that the due date has been changed. Here's a partially written program that you can use as a starting point. Here's a solution.
  3. Programming Assignment 3. This is due on Monday, October 22. Here's a solution.
  4. Programming Assignment 4. This is due on Monday, November 12. Here's some code that may be use ful: Here's a solution
  5. Programming Assignment 5. This is due on Monday, December 3. Here's some code that may be useful: Here's a solution that uses static partitioning, and here's a solution that uses dynamic partitioning

Other Information
  1. Java implementation of the trapezoidal rule.
  2. C implementation of the trapezoidal rule.
  3. Two examples of argument passing:
  4. Array merge program.
  5. Word search program.
  6. Some short programs that use (or try to use) structs.
  7. A broken linked list program.
  8. A linked list program which is believed to be OK, but . . .
  9. Professor Benson's GDB tutorial
  10. A brief guide to using the Penguin cluster.
  11. A hello world program that uses MPI.
  12. An implementation of the trapezoidal rule that uses MPI.
  13. Another implementation of the trapezoidal rule that uses MPI. This version allows user input.
  14. Another implementation of the trapezoidal rule that uses MPI. This version prints runtimes, speedups and efficiencies.
  15. A key to the first midterm
  16. An MPI program that computes a global sum
  17. An implementation of Floyd's algorithm
  18. A program for generating matrices that can be used as input to the floyd program.
  19. A function for printing a row of the adjacency matrix as a string
  20. A serial program that computes a dot product
  21. An MPI program that computes a dot product
  22. An MPI program that computes a dot product using MPI_Allreduce
  23. An MPI program that implements a tree-structured broadcast
  24. An MPI program that implements a butterfly-structured global sum
  25. A serial program that implements matrix-vector multiplication
  26. An MPI program that implements matrix-vector multiplication
  27. A serial bubble sort program
  28. A serial odd-even transposition sort program
  29. An MPI implementation of odd-even sort
  30. A program that uses the qsort library function
  31. A serial insertion sort program
  32. A serial selection sort program
  33. A header file containing a macro that computes the current time.
  34. Run times for parallel odd-even sort
  35. A Pthreads hello, world program
  36. A Pthreads matrix-vector multiplication program
  37. A serial program for estimating pi
  38. An attempt at a Pthreads program for estimating pi
  39. An attempt at a Pthreads program for estimating pi that uses busy waiting
  40. A second attempt at a Pthreads program for estimating pi that uses busy waiting
  41. A Pthreads program for estimating pi that uses mutexes
  42. A Pthreads program for estimating pi that uses semaphores
  43. A Pthreads program that sends messages using semaphores
  44. A Pthreads matrix-vector multiplication program slightly optimized
  45. A Table of Run Times and Efficiencies for Matrix-Vector Multiplication (Thanks to Greg Brown for preparing this.)
  46. Two programs for implementing barriers: this one uses busy-waiting and a mutex; this one uses semaphores.
  47. Another program implementing barriers. This one uses a condition variable.
  48. A Pthreads program that attempts to identify words in text input.
  49. A Pthreads program that successfully identifies words in text input.
  50. A key to the second midterm
  51. A broken linked list program that uses Pthreads.
  52. A simple pseudo-random number generator and a header file
  53. An implementation of read-write locks. This is a stripped down version of the implementation in David Butenhof's, Programming with POSIX Threads.
  54. A header file for the implementation of read-write locks.
  55. A very simple OpenMP program.
  56. Various implementations of the trapezoid rule.
  57. Run times of the implementations of the trapezoidal rule
  58. Some examples of loops that OpenMP won't parallelize.
  59. Some attempts to write sort functions using OpenMP



Peter Pacheco 2007-12-05