Making the Fault-Tolerance of Emerging Neural Network Accelerators Scalable

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)(2019)

引用 3|浏览48
暂无评分
摘要
Deep neural network (DNN) accelerators built upon emerging technologies, such as memristor, are gaining increasing research attention because of the impressive computing efficiency brought by processing-in-memory. One critical challenge faced by these promising accelerators, however, is their poor reliability: the weight, which is stored as the memristance or resistance value of each device, suffers large uncertainty incurred by unique device physical limitations, e.g. stochastic programming, resistance drift etc., translating into prominent testing accuracy degradation. Non-trivial retraining, weight remapping or redundant cell fixing, are popular approaches to address this issue. However, these solutions have limited scalability since they are more like tedious patch adding or bug fixing after identifying each accelerator-dependent defect map. On the other side, scalable solutions are highly desirable in the envisioned scenario of a neural network trained once in the cloud and deployed to many edge devices with each equipped with an emerging accelerator. In this paper, we discuss the challenge and requirement of the fault-tolerance in these new accelerators. Then we show how to address this problem through a scalable algorithm-hardware codesign method, with a focus on unleashing the algorithmic error-resilience of DNN classifiers, so as to eliminate any expensive defect-map-specific calibration or training-from-scratch.
更多
查看译文
关键词
neural network training,stochastic programming,DNN classifiers,scalable algorithm-hardware codesign method,edge devices,scalable solutions,accelerator-dependent defect map,redundant cell,nontrivial retraining,resistance drift,processing-in-memory,deep neural network accelerators,fault-tolerance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要