EECS 587, Parallel Computing

Professor: Quentin F. Stout

For Fall 2008 the class will meet Tuesday and Thursdays 12:00-13:30 in 3437 EECS.

 

You Can't Hide

It is almost impossible to buy a computer that isn't parallel. New CPUs are on chips with multiple processors (multicore systems): that's what "core duo" refers to in the ads for laptops and desktops. The Sony Playstation 3 uses a chip with several cores (the IBM Cell processor, which is also being used in some supercomputers), and some graphics processing chips have 128 specialized compute cores. The number of cores/chip will continually increase, and hence parallel computing is needed to make use of whatever system you buy. This is especially true for compute-intensive tasks such as simulations or analyzing large amounts of data.


Parallel Computers

Parallel computers are easy to build - it's the software that takes work.


Audience

Typically about half the class is from Computer Science and Engineering, and half is from a wide range of other areas throughout the sciences, engineering, and medicine. Some CSE students want to become researchers in parallel computing or grid computing, while students outside CSE typically intend to apply parallel computing to their discipline, working on problems too large to be solved via standard single processor machines. Students range from seniors through postdocs, and occasionally faculty sit in on the course as well.

Satisfying Degree Requirements

This course can be used to satisfy requirements in a variety of degree programs.

Content

The course covers a wide range of aspects of parallel computing, with the emphasis on developing efficient software for commercially available systems, such as the clusters here on campus. We will emphasize using standard software, including MPI (Message Passing Interface) and OpenMP. Use of standard software helps make the code more portable, preserving the time invested in it. Because there is not a single parallel computing model, you also have to learn some about various parallel architectures, since there is significant hardware/software interaction. This includes aspects such as interconnection networks, shared vs. distributed memory, fine-grain vs. medium-grain, MIMD/SIMD, vector processing, cache structure, etc.

We examine many reasons for poor parallel performance, and a wide range of techniques for improving it. Concepts such as domain decomposition; deterministic, probabilistic and adaptive load balancing; and sychronization will be covered. You'll learn why modest parallelization is relatively easy to achieve, and why efficient massive parallelization is quit difficult - in particular, we will cover various implications of Amdahl's Law. Examples and programs will be numeric, such as matrix multiplication, and nonnumeric, such as sorting, but they do not assume any deep knowledge in either field.

You'll also learn some about high performance computing in general. It makes no sense to spend a lot of money on parallel computers if you could get just as much performance on a serial computer had you only tuned your code. For example, intelligent use of cache is critical, and can give an order of magnitude improvement in some cases.

Here is a somewhat whimsical overview of parallel computing.

Work required

Your grade will be based on written homeworks, computer programming projects, and a final project of your choosing (though I must approve it). Often students can integrate this project in with their other work, so, for example, it may be part of their thesis. Other students have used this project to start a new research area, and have ultimately gotten a thesis in the topic. Many of the projects have lead to publications, and a few have resulted in awards of various types. For example, students in nuclear engineering might work on parallelizing serial code which is a simulation of a reactor; students interested in operating systems might develop a technique for measuring and improving performance; students interested in theoretical computer science might develop a new algorithm for a graph problem, etc. If you don't have any ideas for a project, I'll help you find some.

Prerequisites

Ability to program well in C or Fortran, plus an ability to analyze programs, plus willingness to rethink how problems should be solved. You do not need any prior experience with parallel computing or supercomputing, nor do you need to have a background in numerical analysis.

Texts

None, but we'll use some on-line computer manuals and tutorials, as well as some papers and small sections of texts. (You could always buy my book, but it is not very relevant to the course since it is focused on abstract models of parallelism and algorithms. None of the algorithms in it are appropriate for the computers we'll be using.)

Course Material

  1. Notation used.
  2. All-to-All algorithms
  3. An expositon of bitonic sort by Christian Scheideler.
  4. MPI tutorial from Edinburgh Parallel Computing Centre.
  5. MPI language reference manual. There is an MPI website which includes an online version of the manual.
  6. OpenMP reference material
  7. More will be added during the course.

Class Homepage

www.eecs.umich.edu/~qstout/587/

Miscellaneous

Some parallel computing resources available on the Web.

Everyone in this class is invited to the weekly happy hour sponsored by CSEG (the Computer Science and Engineering Graduate Student Organization). It meets 4:30 until ?, usually in 2817 CSE.


Homework Assignments

  1. Start thinking in parallel.
  2. Some ideas for the final project

 


Quentin Copyright © 2006 Quentin F. Stout