Tree Partitioning Reduction: A New Parallel Partition Method for Solving Tridiagonal Systems.

ACM Transactions on Mathematical Software(2019)

引用 6|浏览67
暂无评分
摘要
Solving tridiagonal linear-equation systems is a fundamental computing kernel in a wide range of scientific and engineering applications, and its computation can be modeled with parallel algorithms. These parallel solvers are typically designed to compute problems whose data fit in a common shared-memory space where all the cores taking part in the computation have access. However, when the problem size is large, data cannot be entirely stored in the common shared-memory space, and a high number of high-latency communications are performed. One alternative is to partition the problem among different memory spaces. At this point, conventional parallel algorithms do not facilitate the partition of computation in independent tiles, since each reduction depends on equations that may be in different tiles. This article proposes an algorithm based on a tree reduction, called the Tree Partitioning Reduction (TPR) method, which partitions the problem into independent slices that can be partially computed in parallel within different common shared-memory spaces. The TPR method can be implemented for any parallel and distributed programming paradigm. Furthermore, in this work, TPR is efficiently implemented for CUDA GPUs to solve large size problems, providing highly competitive performance results with respect to existing packages, being, on average, 22.03× faster than CUSPARSE.
更多
查看译文
关键词
CUDA,CUSPARSE,GPU,tridiagonal systems,tuning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要