Defense Event

A Hardware/Software Approach for Alleviating Scalability Bottlenecks in Transactional Memory Applications

Geoffrey Wyman Blake

Friday, March 25, 2011
4:00pm - 6:00pm
3725 Beyster Bldg.

Add to Google Calendar

About the Event

Scaling processor performance with future technology nodes is essential to enable future applications for devices ranging from smart-phones to servers. But the traditional methods of achieving that performance through frequency scaling and single-core architectural enhancements are no longer viable due to fundamental scaling limits. To continue scaling performance, parallel computers in the form of Chip Multi-processors (CMPs) are now prevalent, moving the challenge of parallel programming from a niche to the general domain. One challenging area is scalable synchronization to shared data structures using traditional methods. It can take many years for expert programmers using traditional methods to craft a scalable and correct scheme to synchronize access to data-structures in a complex program. Researchers have been searching for methods to make synchronization more tractable. One proposal is to use “Transactional Programming” to abstract synchronization to shared data structures as transactions in a similar fashion as database operations. Trans- actional programming can be efficiently supported by using a “Transactional Memory” (TM) system. One main problem with TM systems is scalability bottlenecks. When transactional applications are written to emulate future average programmer practices, performance can be worse than a single processor on large CMPs. This should not happen on a system meant to make programming easier. This happens because transactions as represented in the TM system may be dependent on each other—accessing the same data and therefore must serialize—without the programmer being knowledgeable about these dependencies due to the abstraction hiding system details. This thesis develops a hardware/software approach to alleviate scalability bottlenecks in TM systems, while maintaining the level of abstraction presented in transactional programming. I first introduce “Proactive Transaction Scheduling” (PTS), a technique that profiles parallel code at runtime to determine orders transactions should execute in to maintain acceptable forward progress. I then propose using PTS to automatically determine transactions causing large amounts of serialization. These transactions are then accelerated using an asymmetric CMP to get better performance. I also show PTS can be used to partition resources in a Multi-threaded processor core for better overall performance over a fair partitioning of resources.

Additional Information

Sponsor(s): Trevor N. Mudge

Open to: Public