An On-Node Scalable Sparse Incomplete LU Factorization for a Many-Core Iterative Solver with Javelin

Parallel Computing(2020)

引用 3|浏览0
暂无评分
摘要
We present a scalable incomplete LU factorization to be used as a preconditioner for solving sparse linear systems with iterative methods in the package called Javelin. Javelin allows for improved parallel factorization on shared-memory many-core systems by packaging the coefficient matrix into a format that allows for high performance sparse matrix-vector multiplication and sparse triangular solves with minimal overheads. The framework achieves these goals by using a collection of traditional permutations, point-to-point thread synchronizations, tasking, and segmented prefix scans in a conventional compressed sparse row (CSR) format. Moreover, this framework stresses the importance of co-designing dependent tasks, such as sparse factorization and triangular solves, on highly-threaded architectures. We compare our method to the past distributed methods for incomplete factorization (Aztec) and current multithreaded packages (WSMP) in order to demonstrate the importance of having highly threaded factorizations on many-core systems. Using these changes, traditional fill-in and drop tolerance methods can be used, while still being able to have observed speedups of up to  ~ 42 ×  on 68 Intel Knights Landing cores and  ~ 12 ×  on 14 Intel Haswell cores. Moreover, this work provides insight into how the new data-structure impacts iteration counts, and provides insight into future improvements, such as point to GPUs.
更多
查看译文
关键词
Sparse linear algebra,Manycore,Incomplete factorization,Iterative solver,Sparse triangular solve
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要