A pipelining strategy for accelerating convolution neural networks on ARM CPUs

Xin Zhou,Yong Dou,Rongchun Li,Peng Zhang,Yuntao Liu

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE（2022）

引用 2|浏览18

暂无评分

摘要

Convolution is a primary operation in convolution neural networks. The speed of inference is mainly decided by the speed of the convolutional layer. Improving the performance of embedded processors makes it possible to process the inference on embedded devices. In this article, a pipelining strategy of single instruction and multiple data (SIMD) instructions is proposed to finely optimize the process of the 3 x 3 convolution on ARM-based CPUs. We implement the SIMD group to improve the efficiency of the SIMD pipeline. A tiling method is exploited to increase data reuse during the process. An evaluation model is proposed to guide the design of the tiling method and register allocation. The speed of our implementation is 5.18 times of the GNU compiler collection compiled unoptimized version on RK3288. The effect of our optimizing method is measured by a performance profiling tool, the performance information suggests that the pipelining strategy has a significant effect for both normal and depthwise separable convolution. By implementing multithread processing, the speedup achieves 18.3 compared with the single thread unoptimized version.

查看译文

关键词

ARM, CNN, embedded processor, ILP, pipeline, SIMD

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要