Designing Efficient Index-Digit Algorithms for CUDA GPU Architectures.

Jacobo Lobeiras,Margarita Amor,Ramon Doallo

IEEE Trans. Parallel Distrib. Syst.（2016）

引用 17|浏览20

暂无评分

摘要

Modern graphics processing units (GPUs) offer very high computing power at relatively low cost. Nevertheless, designing efficient algorithms for the GPUs normally requires additional time and effort, even for experienced programmers. In this work we present a tuning methodology that allows the design for CUDA-enabled GPU architectures of index-digit algorithms, that is, algorithms where the data movement can be described as the permutations of the digits comprising the indices of the data elements. This methodology, based on two-stages identified as GPU resource analysis and operators string manipulation, is applied to FFT and tridiagonal systems solver algorithms, analyzing the performance features and the most adequate solutions. The resulting implementation is compact and outperforms other well-known and commonly used state-of-the-art libraries, with an improvement of up to 19.2 percent over NVIDIA's complex CUFFT , and more than 3000 percent over the NVIDIA's CUDPP for real data tridiagonal systems.

查看译文

关键词

Graphics processing units,Instruction sets,Registers,Algorithm design and analysis,Kernel,Memory management

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要