Generating Portable High-Performance Code via Multi-Dimensional Homomorphisms

2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)(2019)

引用 11|浏览38
暂无评分
摘要
We address a key challenge in programming high-performance applications - achieving portable performance, i.e., the same source code achieves a consistent, high level of performance over the variety of modern parallel processors, including multi-core CPU and many-core Graphics Processing Unit (GPU), and over the variety of input sizes. Our approach relies on the algebraic formalism of Multi-Dimensional Homomorphisms (MDH), which enables expressing data-parallel computations uniformly via a higher-order function (a.k.a. parallel pattern). For MDHs, we develop a novel code generation approach based on a generic OpenCL implementation. Our implementation efficiently exploits the OpenCL's abstract platform and memory model, generically for arbitrary MDH functions, by incorporating a parameterized parallelization and tiling strategy - on both layers of the OpenCL's two models and in all dimensions of the multi-dimensional input. We achieve performance portability for MDHs by auto-tuning the parameters of our two strategies, thereby enabling fully automatically optimizing our code for any combination of an MDH function, target device, and input size. We demonstrate for computations from four popular domains - dense linear algebra (BLAS), stencil computations, data mining, and tensor contractions - how we express them in the MDH formalism, and we experimentally show that our automatically generated and auto-tuned code for them achieves competitive and often significantly better performance than several state-of-practice approaches on both Intel multi-core CPU and NVIDIA many-core GPU - speedups of up to 5x over the state-of-the-art performance-portable approaches, and competitive or even better performance as compared to hand-optimized approaches such as Intel MKL and NVIDIA cuBLAS on real-world input data as used in deep learning.
更多
查看译文
关键词
Performance Portability,Multi Dimensional Homomorphisms,OpenCL,Auto Tuning,GPU,Multi Core CPU,BLAS,Deep Learning,Stencils,Tensor Contractions
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要