Evaluation of Communication Performance in Distributed and Parallel Systems

Graduate Students: Ashish Mehra, Jennifer Rexford, and Wu-chang Feng

Faculty: Kang G. Shin

Sponsor: NSF, ONR

The performance of parallel and distributed applications directly depends on communication performance. It is imperative, therefore, to evaluate the components comprising the communication subsystem, ranging from front-end network devices to higher-level communication protocols. The division of functionality between these hardware and software components can have direct performance implications. The complexity of these components, coupled with their subtle interactions, introduces myriad design alternatives. Investigating these design options in a controlled manner necessitates a well-formed evaluation framework. This research integrates a variety of evaluation tools and techniques to characterize communication performance and influence the design of new communication architectures.

A variety of inter-related factors, including network topology, routing, switching, and flow control, impinge on communication performance. Modeling facilitates a cost-effective investigation of this large design space to gain a deeper understanding of the interactions between these design parameters. Analytic and simulation models allow consideration of various design choices and their effects on packet/message latency, variability of delay, and implementation complexity. A point-to-point message simulation environment, developed in-house, facilitates experimentation with these parameters in a single, controlled environment. This object-oriented simulator provides a toolbox of primitives for a variety of network topologies, communication workloads, routing algorithms and hardware models. These hardware models can vary from high-level designs to low-level specification of actual devices, allowing incremental investigation of implementation approaches.

An accurate comparison of different designs requires the models to capture the effects of realistic communication patterns. These traffic patterns stem from application-level constructs, such as request/response or bulk data transfer, and the protocol software that converts these operations into messages and packets. Likewise, the network front-end generates data and control patterns that influence the operation of these protocols. While analytic and simulation models lend insight into low-level routing and flow control, characterizing protocol software requires direct measurement of functioning systems.

Protocol software can be viewed as a black box, converting application-generated communication requests into the appropriate commands for the underlying communication device. Protocol execution inside the black box can be viewed as a transformation of input communication patterns into output communication patterns. The execution characteristics of the protocol software determine this transformation function. A controlled, monitored execution environment is, therefore, essential in order to characterize the behavior of protocol software. Synthetic workloads, which can generate specified communication patterns, provide a set of communication benchmarks for characterizing protocol software. A synthetic workload generator, developed in-house, facilitates the generation and execution of synthetic workloads. In addition, a performance monitoring tool, also developed in-house, provides the capability to record the duration and frequency of certain specified events. These tools are being employed to study protocol execution in the x- kernel, a communication executive allowing the composition of protocols, using communicating tasks as the synthetic workload. This characterization assists in identifying design optimizations such determining the location and functionality of the hardware/software boundary, and techniques for tuning application- generated patterns to match those most effectively supported by the communication subsystem.