Microarchitecture for defect tolerance and resiliency

Microarchitecture for defect tolerance and resiliency(2007)

引用 23|浏览5
暂无评分
摘要
Continued device scaling allows faster and more complex CPUs but comes at the cost of an increase in the likelihood of CPU failures. This thesis address this worsening problem at the architectural level; proposing and evaluating three microarchitectures designed to compensate for increasing failure rates. The first microarchitecture discussed in this thesis targets defects that are evident immediately after fabrication or arise during burn-in. Conventionally, CPUs with such defects are destroyed leading to reduced yield and reduced profitability. This thesis proposes a superscalar architecture that allows defects to be isolated to architectural components (within a single CPU core) that can be disabled, leaving a functionally-correct CPU and increasing yield. The second microarchitecture proposed in this thesis targets failures that arise in the field. In this microarchitecture I augment current Simultaneous Multi-Threading (SMT) hardware to redundantly execute instructions on different microarchitectural structures within the same CPU core. Thus this microarchitecture constantly monitors for failures, allowing defects to be detected as soon as they arise. The third microarchitecture proposed in this thesis also targets failures that arise in the field but trades-off some detection latency to significantly reduce the energy and performance cost of redundancy-based detection. In this microarchitecture I propose area-efficient architectural support that allows high-coverage testing phases to be transparently interleaved with computation in the field.
更多
查看译文
关键词
functionally-correct CPU,CPU failure,architectural component,defect tolerance,thesis targets defect,thesis targets failure,architectural level,single CPU core,CPU core,complex CPUs,area-efficient architectural support
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要