Unrolled Memory Inner-Products: An Abstract Gpu Operator For Efficient Vision-Related Computations

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)(2017)

引用 4|浏览26
暂无评分
摘要
Recently, convolutional neural networks (CNNs) have achieved great success in fields such as computer vision, natural language processing, and artificial intelligence. Many of these applications utilize parallel processing in GPUs to achieve higher performance. However, it remains a daunting task to optimize for GPUs, and most researchers have to rely on vendor-provided libraries for such purposes.In this paper, we discuss an operator that can be used to succinctly express computational kernels in CNNs and various scientific and vision applications. This operator, called Unrolled-Memory-Inner-Product (UMI), is a computationally-efficient operator with smaller code token requirement. Since a naive UMI implementation would increase memory requirement through input data unrolling, we propose a method to achieve optimal memory fetch performance in modern GPUs. We demonstrate this operator by converting several popular applications into the UMI representation, and achieve 1.3x-26.4x speedup against frameworks such as OpenCV and Caffe.
更多
查看译文
关键词
Unrolled Memory Inner-products,abstract GPU operator,convolutional neural networks,computer vision,natural language processing,artificial intelligence,parallel processing,memory requirement,UMI representation,CNN,vision-related computations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要