SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity
CoRR(2023)
摘要
To address the challenge of increasing network size, researchers have
developed sparse models through network pruning. However, maintaining model
accuracy while achieving significant speedups on general computing devices
remains an open problem. In this paper, we present a novel mobile inference
acceleration framework SparseByteNN, which leverages fine-grained kernel
sparsity to achieve real-time execution as well as high accuracy. Our framework
consists of two parts: (a) A fine-grained kernel sparsity schema with a
sparsity granularity between structured pruning and unstructured pruning. It
designs multiple sparse patterns for different operators. Combined with our
proposed whole network rearrangement strategy, the schema achieves a high
compression rate and high precision at the same time. (b) Inference engine
co-optimized with the sparse pattern. The conventional wisdom is that this
reduction in theoretical FLOPs does not translate into real-world efficiency
gains. We aim to correct this misconception by introducing a family of
efficient sparse kernels for ARM and WebAssembly. Equipped with our efficient
implementation of sparse primitives, we show that sparse versions of
MobileNet-v1 outperform strong dense baselines on the efficiency-accuracy
curve. Experimental results on Qualcomm 855 show that for 30% sparse
MobileNet-v1, SparseByteNN achieves 1.27x speedup over the dense version and
1.29x speedup over the state-of-the-art sparse inference engine MNN with a
slight accuracy drop of 0.224%. The source code of SparseByteNN will be
available at https://github.com/lswzjuer/SparseByteNN
更多查看译文
关键词
mobile,fine-grained
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要