High-Performance matrix-vector multiplication on the GPU

EURO-PAR 2011: PARALLEL PROCESSING WORKSHOPS, PT I(2011)

引用 14|浏览1
暂无评分
摘要
In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.
更多
查看译文
关键词
recent nvidia tesla 20-series,gpu kernel,fine-grained parallelism,matrix shape,popular dense linear algebra,many-core gpu,high-performance gpu kernel,fermi architecture,dense matrix-vector multiplication,matrix-vector multiplication,high-performance matrix-vector multiplication
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要