Self-repair of uncore components in robust system-on-chips: An OpenSPARC T2 case study
ITC(2013)
摘要
Self-repair replaces/bypasses faulty components in a system-on-chip (SoC) to keep the system functioning correctly even in the presence of permanent faults. Such faults may result from early-life failures, circuit aging, and manufacturing defects and variations. Unlike on-chip memories, processor cores, and networks-on-chip, little attention has been paid to self-repair of uncore components (e.g., cache controllers, memory controllers, and I/O controllers) that occupy significant portions of multi-core SoCs. In this paper, we present new techniques that utilize architectural features to achieve self-repair of uncore components while incurring low area, power, and performance costs. We demonstrate the effectiveness and practicality of our techniques, using the industrial OpenSPARC T2 SoC with 8 processor cores that support 64 hardware threads. Our key results are: 1. Our techniques enable effective self-repair of any single faulty uncore component with 7.5% post-layout chip-level area impact and 3% power impact. In contrast, existing redundancy techniques impose high (e.g., 16%) area costs. Our techniques do not incur any performance impact in fault-free systems. In the presence of a single faulty uncore component, there can be a 5% application performance impact. 2. Our techniques are capable of self-repairing multiple faulty uncore components without any additional area impact, but with graceful degradation of application performance. 3. Our techniques achieve high self-repair coverage of 97.5% in the presence of a single fault. Our self-repair techniques also enable flexible tradeoffs between self-repair coverage and area costs. For example, 75% self-repair coverage can be achieved with 3.2% post-layout chip-level area impact.
更多查看译文
关键词
uncore component self-repair,hardware threads,power impact,circuit aging,permanent faults,early-life failures,processor cores,cache controllers,manufacturing defects,system-on-chip,industrial opensparc t2 soc,multiprocessing systems,i-o controllers,failure analysis,fault-free systems,opensparc t2 case study,fault diagnosis,multicore soc,robust system-on-chips,post-layout chip-level area impact,integrated circuit layout,single-faulty uncore component,network-on-chip,multiple-faulty uncore components,on-chip memories,memory controllers,system on chip
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络