Department of Computer Science University of San Francisco


Computer Science 220-01
Introduction to Parallel Computing
Fall 2011

MWF 11:45-12:50, KA 172


Professor: Peter Pacheco
Office: Harney 540
Phone: 422-6630
Email: domain: cs.usfca.edu, user: peter
Office Hours: M 4-5, W and F 10-11, and by appointment

TA: Shah El-Rahman
Email: domain: cs.usfca.edu, user: snelrahman
Office Hour: Tue and Thu 1:30-2:30 and 4:30-5:30 in HR 530 or HR 535

Class mailing list: The earlier instructions for joining the list were incorrect. Please send your preferred email address to the instructor, and he'll add you to the list. Once you're a member of the list you can also post messages by sending email to user cs220 in the domain cs.usfca.edu.

Course Syllabus (Here's a PDF Version.)


Programming Assignments

  1. Programming Assignment 1. Here's a solution.
  2. Programming Assignment 2. Here's a solution.
  3. Programming Assignment 3. Here's a serial program implementing Floyd-Warshall. Here's a program for generating matrices. Here's a function for printing rows of a matrix in an MPI program. Here's a solution.
  4. Programming Assignment 4. Here's a serial program for finding primes. Here's an MPI program for printing a list of ints as a string. Here's a solution.
  5. Programming Assignment 5. Here's a serial program that uses recursion. Here's a serial program that uses its own stack. Here's solution that uses static partitioning. Here's solution that uses dynamic partitioning.


Homework Assignments

  1. Homework Assignment 1. Here's a solution.
  2. Homework Assignment 2. We started work on a merge function in class. This code shows where we left off. Complete the program by completing the merge function and writing the two I/O functions. Due Friday, September 9 at 11 am. Here's a solution.
  3. Homework Assignment 3. Implement the Member function in the linked list program. Input arguments are the head of the list and the value to be searched for. The return value is zero if the value is not in the list and nonzero if the value is in the list. Be sure to modify the main function so that it will call the Member function and print the results. Here's a solution.
  4. Homework Assignment 4. Modify the Delete function in the int linked list program so that it deletes every occurrence of val from the list. This is due on Friday, September 23. Here's a solution.
  5. Homework Assignment 5. Write an MPI program that estimates the area under a curve using Simpson's Rule. See assignment 1, the solution to assignment 1, and the MPI trapezoidal rule program. You can assume that the number of subintervals (n) divided by the number of processes (p) is even. In addition to the estimate of the area, your program should report the time it spent in Simpson's rule. This is due on Friday, September 30. Here's a solution.
  6. Homework Assignment 6. Modify the program that computes a global sum using integer arithmetic so that it will work with any number of processes -- not just a power of 2. This is due on Friday, October 14. Here's a solution.
  7. Homework Assignment 7. Write an MPI program that implements a tree-structured broadcast function. Your program should get an int from the user and then call your broadcast function. After the broadcast function has completed, each process should print the int it received in the broadcast. Note that this is an exception to the rule that only process 0 should print results. You can assume that the number of processes is a power of two. This is due on Friday, October 21. Here's a solution.
  8. Homework Assignment 8. DAXPY stands for "Double precision Alpha X Plus Y." If x and y are n-dimensional arrays of doubles and alpha is a double, then the code for a DAXPY is
          for (i = 0; i < n; i++)
             y[i] += alpha*x[i];
       
    Write a Pthreads program that computes a DAXPY. The main thread should read in n, allocate storage for x and y, and read in x, y and alpha. When the threads have finished computing the DAXPY, the main thread should print the result. You should use a block distribution of the elements of x and y, and you can assume that n is evenly divisible by thread_count. You can make x, y, alpha, and n global variables. This is due on Friday, October 28. Here's a solution.
  9. Homework Assignment 9. Write a Pthreads program that finds the dot product of two user-input vectors. The main thread should read in the order of the vectors and their contents. It should then start thread functions, each of which computes part of the dot product. The vectors, their order, and the dot product should be stored in shared variables. The main thread should print the result. Use a cyclic partition of the vectors and busy-waiting to protect access to the critical section. This is due on Friday, November 4.
  10. Homework Assignment 10. The program many_mutexes.c repeatedly locks and unlocks a mutex. Modify it so that it uses semaphores instead of mutexes. Run each program at least three times on a node of the penguin cluster using 4 threads and n = 1,000,000. How do the minimum run times compare? (Note: the last time I checked unnamed semaphores -- which we're using -- were not implemented on MacOS X. So you may need to develop your semaphore program on a Linux system.) This is due on Friday, November 11. Here's a solution.
  11. Homework Assignment 11. Write an OpenMP program that estimates the area under a curve using Simpson's Rule. See assignment 1, the solution to assignment 1, and the OpenMP trapezoidal rule program. You can assume that the number of subintervals (n) divided by the number of threads is even. You don't need to time the code, but you can use timer.h if you want to. This is due on Friday, December 2. Here's a solution.
  12. Homework Assignment 12. Write an OpenMP program that implements a dot product. Use the serial dot product program as your starting point. You should use a parallel for directive to parallelize the main for loop. This is due on Wednesday, December 7. Here's a solution.


Other Information

  1. Brief Introduction to Subversion
  2. A Very Brief Introduction to gdb
  3. Brief Introduction to Using the Penguin Cluster.
  4. Some run-times for the MPI trapezoidal rule program.
  5. A list of topics for the first midterm.
  6. A key to the first midterm.
  7. Performance of two implementations of shared memory matrix-vector multiplication for various inputs
  8. A list of topics for the second midterm.
  9. A key to the second midterm.
  10. Performance of various shared memory implementations of the trapezoidal rule
  11. List of topics covered since the second midterm


Code Examples

  1. Trapezoidal rule implementations:
  2. Argument passing:
  3. Arrays:
  4. Strings:
  5. Linked lists:
  6. Basic MPI:
  7. Taking timings:
  8. Global sums:
  9. Linear algebra:
  10. Sorting:
  11. Basic Pthreads:
  12. Producer-consumer synchronizations:
  13. Pthreads matrix vector multiplication:
  14. Implementing barriers in Pthreads:
  15. Threadsafety:
  16. Multithreaded linked lists:
  17. Basic OpenMP:
  18. OpenMP Trapezoidal Rule
  19. Loops in OpenMP
  20. A Simple Sorting Algorithm for Shared Memory



Peter Pacheco 2011-12-07