Boosting The Throughput And Accelerator Utilization Of Specialized Cnn Inference Beyond Increasing Batch Size

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139(2021)

引用 20|浏览59
暂无评分
摘要
Datacenter vision systems widely use small, specialized convolutional neural networks (CNNs) trained on specific tasks for high-throughput inference. These settings employ accelerators with massive computational capacity, but which specialized CNNs underutilize due to having low arithmetic intensity. This results in suboptimal application-level throughput and poor returns on accelerator investment. Increasing batch size is the only known way to increase both application-level throughput and accelerator utilization for inference, but yields diminishing returns; specialized CNNs poorly utilize accelerators even with large batch size. We propose Folded-CNNs, a new approach to CNN design that increases inference throughput and utilization beyond large batch size. Folded-CNNs rethink the structure of inputs and layers of specialized CNNs to boost arithmetic intensity: in FoldedCNNs, f images with C channels each are concatenated into a single input with fC channels and jointly classified by a wider CNN. Increased arithmetic intensity in FoldedCNNs increases the throughput and GPU utilization of specialized CNN inference by up to 2.5 x and 2.8 x , with accuracy close to the original CNN in most cases.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要