Boosting The Throughput And Accelerator Utilization Of Specialized Cnn Inference Beyond Increasing Batch Size

Jack Kosaian,Amar Phanishayee,Matthai Philipose,Debadeepta Dey,Rashmi Vinayak

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139（2021）

引用 20|浏览59

暂无评分

摘要

Datacenter vision systems widely use small, specialized convolutional neural networks (CNNs) trained on specific tasks for high-throughput inference. These settings employ accelerators with massive computational capacity, but which specialized CNNs underutilize due to having low arithmetic intensity. This results in suboptimal application-level throughput and poor returns on accelerator investment. Increasing batch size is the only known way to increase both application-level throughput and accelerator utilization for inference, but yields diminishing returns; specialized CNNs poorly utilize accelerators even with large batch size. We propose Folded-CNNs, a new approach to CNN design that increases inference throughput and utilization beyond large batch size. Folded-CNNs rethink the structure of inputs and layers of specialized CNNs to boost arithmetic intensity: in FoldedCNNs, f images with C channels each are concatenated into a single input with fC channels and jointly classified by a wider CNN. Increased arithmetic intensity in FoldedCNNs increases the throughput and GPU utilization of specialized CNN inference by up to 2.5 x and 2.8 x , with accuracy close to the original CNN in most cases.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要