Performance, portability, and productivity for data-parallel applications on multi- and many-core architectures

Ari Rasch

Proceedings Companion of the 2019 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity（2019）

引用 0|浏览15

暂无评分

摘要

We present a novel approach to performance, portability, and productivity of data-parallel computations on multi- and many-core architectures. Our approach is based on Multi-Dimensional Homomorphisms (MDHs) -- a formally defined class of functions that cover important data-parallel computations, e.g., linear algebra routines (BLAS) and stencil computations. For MDHs, we present a high-level Domain-Specific Language (DSL) that contributes to high user productivity, and we propose a corresponding DSL compiler which automatically generates optimized (auto-tuned) OpenCL code, thereby providing high, portable performance, over different architectures and input sizes, for programs in our DSL. Our experimental results, on Intel CPU and NVIDIA GPU, demonstrate competitive and often significantly better performance of our approach as compared to state-of-practice approaches, e.g., Intel MKL/MKL-DNN and NVIDIA cuBLAS/cuDNN.

查看译文

关键词

Auto-Tuning, BLAS, GPU, Multi-Dimensional Homomorphisms, OpenCL, Stencils, multi-core CPU

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要