Defense Event

Routing and Topology Reconfiguration for Networkson- Chip’s Runtime Health

Ritesh Parikh

Friday, August 22, 2014
1:00pm - 3:00pm
3725 BBB

Add to Google Calendar

About the Event

As silicon technology evolves, chip multi-processor (CMP) and system-on-chip (SoC) designs are dramatically changing from limited, robust and homogeneous logic blocks to integrating billions of fragile transistors into complex and heterogeneous cores/IPs. This increased integration has compelled architects to design resource-heavy, complex and power-hungry on-chip interconnects, moving towards network-on-chip (NoC) structures. In addition, the waning reliability of silicon poses a great threat to these communication structures as they could potentially be a single point of failure. Further, the heterogeneity and fast time-to-market of upcoming devices makes it nearly impossible to thoroughly verify NoC architectures and optimize them for power at design-time. Failure of NoC architectures to meet correctness, reliability and power-budget requirements has detrimental effects on the runtime operation of NoC-based CMPs and SoCs. Therefore, highly efficient detection and reconfiguration mechanisms are becoming a key requisite to unlock the full potential of future CMPs and SoCs. Such mechanisms can overcome both functional bugs that escaped design-time verification and device failures due to an unreliable silicon substrate. Similarly, runtime reconfiguration solutions can also be leveraged to optimize the communication paths dynamically; particularly, to minimize power dissipation and prevent overheating of the NoC structures. The goal of this dissertation is to develop mechanisms to mitigate threats to NoC runtime health. The proposed solutions are based on monitoring the execution activity of NoCs in a localized and distributed manner using lightweight checkers. Based on the events observed, routing scheme and network topology are updated at runtime to avoid experiencing the same failures in future operation. This thesis specifically focuses on three aspects of NoC runtime health: correct behavior to avoid functional bugs, reliable execution to circumvent faults and power-aware reconfiguration to avert overheating emergencies. The work presented in the thesis will enable designers to aggressively push heterogeneity and time-to-market limits with respect to NoC design.

Additional Information

Sponsor(s): Valeria Bertacco

Open to: Public