Theodore B. Tabe
Janis Hardwick
Quentin F. Stout
University of Michigan
Abstract: For parallel computers, the execution time of communication routines is an important determinate of users' performance. We measured the MPL and MPI performance of the IBM SP2, observing that the higher-level collective communication routines show a drop in performance as the number of processors involved in the communication increases. While a few others have also studied the SP2's communication performance, they have reported only average performance, and failed to comment on the drop in performance or determined its causes.
We generated a distribution of times for these routines and developed a simulator in an attempt to recreate the observed distribution. By studying distributions of communication times and by refining the simulator, we were able to discern that the performance decrease is due to the variation in the communication times of the lower-level send-receive primitives upon which the higher-level communication routines are built. This variation is in turn caused by the deleterious effects of interrupts generated by an operating system (AIX) untuned to high-performance parallel computing. This behavior is sometimes known as jitter, and its elimination is necessary in order for systems to be able to efficiently use thousands of processors.
Our results were obtained for IBM's MPL message-passing library, which is currently the most highly tuned of the communication libraries available. However, other measurements show that the same results hold for the MPI (Message Passing Interface) library.
Keywords: collective communication, performance evaluation, all-to-all, benchmarking, message passing, communication overhead, operating system jitter, interrupts, heavy tails
Complete paper. This paper appears in Computing Science and Statistics 27 (1995), pp.347-351.
![]() |
Copyright © 2009 Quentin F. Stout |