Reliability and Fault Tolerance

Research Areas -> Computer Architecture -> Reliability and Fault Tolerance
 
Overview
The continued scaling of silicon fabrication technology has led to significant reliability concerns which are quickly becoming a dominant design challenge. Design integrity is threatened by complexity challenges in the form of immense designs defying complete verification, and physical challenges such as silicon aging and soft errors which impair correct system operation. Researchers in the Advanced Computer Architecture Lab are addressing these key challenges through synergistic research vectors which range from near-term reliability stress reduction techniques to improve the quality of today’s silicon to longer-term technologies to detect, recover and repair faulty systems. These efforts are supported and complemented by an active reliability modeling research effort and a strong focus on functional verification methodologies. The overarching goal is to provide highly effective and low cost solutions to ensure both correctness and reliability in future designs, thereby extending the lifetime of silicon fabrication technologies.
 
Faculty
Austin, Todd
Chen, Peter M.
Dick, Robert
Hayes, John P.
Mahlke, Scott
Mudge, Trevor
Sylvester, Dennis
Zhang, Zhengya


form photo