Changhee Jung and Dongyoon Lee, assistant professors in computer science, received a NSF award for a project titled “Compiler and Architectural Techniques for Soft Error Resilience”. Professor Jung is the principal investigator and Professor Lee will serve as co-principal investigator. Read the full abstract below or click here:
Due to technology scaling, electronic circuits are becoming more susceptible to radiation-induced soft errors also known as transient faults. Soft errors may lead to application crash or even worse, silent data corruptions (SDC) that are not caught by the error detection logic but may cause the application to produce incorrect output. Another serious problem is the rise of detected unrecoverable errors (DUE) that often directly impact the reliability of any computer applications. The challenge is to achieve soft error resilience in a way that does not significantly increase the performance overhead, power consumption, and complexity of underlying hardware. To this end, this project designs and develops low-cost hardware/software cooperative techniques for soft error resilience. The resulting artifacts and technologies are expected to contribute to the nation’s competitiveness by addressing the challenge of building reliable computing systems in the presence of soft errors.
This research involves three intermediate research goals: design novel microarchitecture, that dynamically verifies the correctness of the processor core execution based on sensor-based soft error detection, to achieve soft error resilience at low cost; design a compiler that forms verifiable and recoverable regions in the presence of soft errors and provides relevant program analysis techniques; and design and develop compiler optimization and microarchitectural techniques that significantly reduce the verification overhead. This project will create tools and technologies for realization of soft error resilient computing systems, contributing fundamentally to the fault tolerance research community. Adoption of the resulting compiler and microarchitectural techniques will impact a broad range of any disciplines that need correct computation results thus requiring reliable computing systems, covering from mobile devices to high-performance large-scale computing systems. Consequently, use of the resulting technologies will make the execution of current and emerging applications much more reliable, and therefore directly affect our way of life.