Assuring application-level correctness against soft errors

Computer-Aided Design(2011)

引用 66|浏览0
暂无评分
摘要
Traditionally, research in fault tolerance has required architectural state to be numerically perfect for program execution to be correct. However, in many programs, even if execution is not 100% numerically correct, the program can still appear to execute correctly from the user's perspective. To quantify user satisfaction, application-level fidelity metrics (such as PSNR) can be used. The output for such applications is defined to be correct if the fidelity metrics satisfy a certain threshold. However, such applications still contain instructions whose outputs are critical -- i. e. their correctness decides if the overall quality of the program output is acceptable. In this paper, we present an analysis technique for identifying such critical program segments. More importantly, our technique is capable of guaranteeing application-level correctness through a combination of static analysis and runtime monitoring. Our static analysis consists of data flow analysis followed by control flow analysis to find static critical instructions which affect several instructions. Critical instructions are further refined into likely non-critical and likely critical sets in a profiling phase. At runtime, we use a monitoring scheme to monitor likely non-critical instructions and take remedial actions if some likely non-critical instructions become critical. Based on this analysis, we minimize the number of instructions that are duplicated and checked at runtime using a software-based fault detection and recovery technique [20]. Put together, our approach can lead to 22% average energy savings for multimedia applications while guaranteeing application-level correctness, when compared to a recent work [9], which cannot guarantee application-level correctness. Comparing to the approach proposed in [20] which guarantees both application-level and numerical correctness, our method achieves 79% energy reduction.
更多
查看译文
关键词
likely critical set,critical program segment,static critical instruction,critical instruction,control flow analysis,analysis technique,application-level correctness,soft error,data flow analysis,likely non-critical instruction,static analysis,thermal,timing analysis,fault tolerant,software fault tolerance,satisfiability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要