DNN Dataflow Choice Is Overrated.

arXiv: Distributed, Parallel, and Cluster Computing(2018)

引用 101|浏览221
暂无评分
摘要
Many DNN accelerators have been proposed and built using different microarchitectures and program mappings. To fairly compare these different approaches, we modified the Halide compiler to produce hardware as well as CPU and GPU code, and show that Halideu0027s existing scheduling language has enough power to represent all existing dense DNN accelerators. Using this system we can show that the specific dataflow chosen for the accelerator is not critical to achieve good efficiency: many different dataflows yield similar energy efficiency with good performance. However, finding the best blocking and resource allocation is critical, and we achieve a 2.6X energy savings over Eyeriss system by reducing the size of the local register file. Adding an additional level in the memory hierarchy saves an additional 25%. Based on these observations, we develop an optimizer that automatically finds the optimal blocking and storage hierarchy. Compared with Eyeriss system, it achieves up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs) respectively.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要