High-Performance matrix-vector multiplication on the GPU

EURO-PAR 2011: PARALLEL PROCESSING WORKSHOPS, PT I（2011）

引用 14|浏览1

暂无评分

摘要

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.

查看译文

关键词

recent nvidia tesla 20-series,gpu kernel,fine-grained parallelism,matrix shape,popular dense linear algebra,many-core gpu,high-performance gpu kernel,fermi architecture,dense matrix-vector multiplication,matrix-vector multiplication,high-performance matrix-vector multiplication

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要