Resilient Neural Network Training for Accelerators with Computing Errors

Dawen Xu, KouZi Xing, Cheng Liu, Ying Wang, Yulin Dai, Long Cheng, Huawei Li, Lei Zhang

2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)（2019）

引用 13|浏览58

暂无评分

摘要

With the advancements of neural networks, customized accelerators are increasingly adopted in massive AI applications. To gain higher energy efficiency or performance, many hardware design optimizations such as near-threshold logic or overclocking can be utilized. In these cases, computing errors may happen and the computing errors are difficult to be captured by conventional training on general purposed processors (GPPs). Applying the offline trained neural network models to the accelerators with errors directly may lead to considerable prediction accuracy loss. To address this problem, we explore the resilience of neural network models and relax the accelerator design constraints to enable aggressive design options. First of all, we propose to train the neural network models using the accelerators' forward computing results such that the models can learn both the data and the computing errors. In addition, we observe that some of the neural network layers are more sensitive to the computing errors. With this observation, we schedule the most sensitive layer to the attached GPP to reduce the negative influence of the computing errors. According to the experiments, the neural network models obtained from the proposed training outperform the original models significantly when the CNN accelerators are affected by computing errors.

查看译文

关键词

resilient training,CNN accelerator,relaxed design constrain,fault tolerance

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要