cuSCNN : an Efficient CUDA Implementation of Sparse CNNs

THE PROCEEDINGS OF THE 13TH INTERNATIONAL SYMPOSIUM ON HIGHLY EFFICIENT ACCELERATORS AND RECONFIGURABLE TECHNOLOGIES, HEART 2023(2023)

引用 0|浏览10
暂无评分
摘要
Deep Neural Network models are becoming much larger which greatly increases their computation and memory requirements. Sparsity offers great opportunities to reduce unnecessary data transfers and computations. However, exploiting sparsity in CNN inference presents challenges such as irregularities in memory access patterns. To overcome this challenge, we propose cuSCNN, an efficient sparse CNN inference engine that leverages the sparsity of both models and activations using optimized sparse-sparse matrix convolution kernels with compressed operands. cuSCNN is motivated by the concepts introduced by the SCNN hardware accelerator[21] but modified appropriately to achieve an efficient software implementation for GPUs. We develop GPU optimizations that boost execution performance and reduce the required memory size and bandwidth. cuSCNN achieves a speedup of up to 171x compared to an efficient CPU implementation and 30x speedup compared to a multi-threaded CPU implementation without batching, enabling the use of inexpensive low-end memory-constrained GPUs to implement large networks with near real-time latency. Although GPU throughput can benefit from larger batch sizes, batch size 1 achieves the lowest latency and hence we focus on it.
更多
查看译文
关键词
Sparse Convolution Neural Network (SCNN),Graphics processing unit (GPU),Accelerator,CUDA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要