About the Event
Graphics Processing Units (GPUs) are becoming common in data centers for tasks like neural network training and image processing due to their high performance and efficiency. GPUs maintain high throughput by running thousands of threads simultaneously, issuing instructions from ready threads to hide latency in stalled threads. While this is effective for keeping the arithmetic units busy, the challenge in GPU design is moving the data for computation at the same high rate. Any inefficiency in data movement and storage will compromise the throughput and energy efficiency of the system.
GPUs in the data center have design goals that make careful data management more critical than before. With energy consumption and cooling being much of the cost of running a data center, energy efficiency is a key concern. Increasing efficiency involves removing stalls and unnecessary overheads like unused thread state. Data center GPUs are shared between multiple users and jobs, and controlling interference between workloads running on the same GPU is another opportunity to increase utilization and efficiency.
This thesis develops techniques to manage and streamline the data storage and movement resources in GPUs. To reduce the amount of data moved from memory, a new type of locality is exploited to merge additional memory requests. To shrink the storage needed for thread state, compiler-guided dynamic register allocation and management is used to store only a subset of registers in high-throughput register structures. Finally, to increase utilization and throughput for shared data center GPUs, dynamic performance prediction and resource allocation is used to meet performance targets for workloads running on the same GPU.