Theodore B. Tabe
Janis Hardwick
Quentin F. Stout
University of Michigan
Abstract: For parallel computers, the execution time of communication routines is an important determinate of users' performance. We measured the MPL and MPI performance of the IBM SP2, observing that all of the higher-level communications routines show a drop in performance as the number of processors involved in the communication increases. While a few others have also recently studied the SP2's communication performance, they have reported only average performance, and failed to comment on the drop in performance or determined its causes.
We generated a distribution of times for these routines and developed a simulator in an attempt to recreate the observed distribution. By studying distributions of communication times and by refining the simulator, we were able to discern that the performance decrease is due to the variation in the communication times of the lower-level send-receive primitives upon which the higher-level communication routines are built. This variation is in turn caused by the deleterious effects of interrupts generated by an operating system (AIX) untuned to high-performance parallel computing.
Our results were obtained for IBM's MPL message-passing library, which is currently the most highly tuned of the communication libraries available. However, other measurements show that the same results hold for the MPI (Message Passing Interface) library.
Keywords: parallel computing, communication systems, performance evaluation, all-to-all communication, benchmarking, message passing, communication overhead, operating system, interrupts, heavy tails, statistics, computer science
Complete paper. This paper appears in Computing Science and Statistics 27 (1995), pp.347-351.
![]() |
Copyright © 2008 Quentin F. Stout |