GCD(2) : A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs

2022 55TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO)(2022)

引用 0|浏览38
暂无评分
摘要
More specialized chips are exploiting available high transistor density to expose parallelism at a large scale with more intricate instruction sets. This paper reports on a compilation system GCD(2), developed to support complex Deep Neural Network (DNN) workloads on mobile DSP chips. We observe several challenges in fully exploiting this architecture, related to SIMD width, more complex SIMD/vector instructions, and VLIW pipeline with the notion of soft dependencies. GCD(2) comprises the following contributions: 1) development of matrix layout formats that support the use of different novel SIMD instructions, 2) formulation and solution of a global optimization problem related to choosing the best instruction (and associated layout) for implementation of each operator in a complete DNN, and 3) SDA, an algorithm for packing instructions with consideration for soft dependencies. These solutions are incorporated in a complete compilation system that is extensively evaluated against other systems using 10 large DNN models. Evaluation results show that GCD(2) outperforms two productlevel state-of-the-art end-to-end DNN execution frameworks (TFLite and Qualcomm SNPE) that support mobile DSPs by up to 6.0x speedup, and outperforms three established compilers (Halide, TVM, and RAKE) by up to 4.5 x, 3.4 x, and 4.0 x speedup, respectively. GCD(2) is also unique in supporting real-time execution of certain DNNs, while its implementation enables two major DNNs to execute on a mobile DSP for the first time.
更多
查看译文
关键词
VLIW instruction packing,compiler optimization,deep neural network,mobile devices
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要