Optimizing Direct Convolutions on ARM Multi-Cores

Pengyu Wang,Weiling Yang,Jianbin Fang,Dezun Dong,Chun Huang,Peng Zhang,Tao Tang,Zheng Wang

SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis（2023）

引用 0|浏览3

暂无评分

摘要

Convolution kernels are widely seen in deep learning workloads and are often responsible for performance bottlenecks. Recent research has demonstrated that a direct convolution approach can outperform the traditional convolution implementation based on tensor-to-matrix conversions. However, existing approaches for direct convolution still have room for performance improvement. We present nDirect, a new direct convolution approach that targets ARM-based multi-core CPUs commonly found in smartphones and HPC systems. nDirect is designed to be compatible with the data layout formats used by mainstream deep learning frameworks but offers new optimizations for the computational kernel, data packing, and parallelization. We evaluate nDirect by applying it to representative convolution kernels and demonstrating its performance on four distinct ARM multi-core CPU platforms. We compare nDirect against state-of-the-art convolution optimization techniques. Experimental results show that nDirect gives the best overall performance across evaluation scenarios and platforms.

查看译文

关键词

Convolution,Direct Algorithm,Neural networks,ARMv8 Multi-Core,Performance Optimization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要