谷歌浏览器插件
订阅小程序
在清言上使用

A performance evaluation of CCS QCD Benchmark on the COMA (Intel(R) Xeon Phi^TM, KNC) system

arXiv: High Energy Physics - Lattice(2016)

引用 23|浏览13
暂无评分
摘要
The most computationally demanding part of Lattice QCD simulations is solving quark propagators. Quark propagators are typically obtained with a linear equation solver utilizing HPC machines. The CCS QCD Benchmark is a benchmark program solving the Wilson-Clover quark propagator, and is developed at the Center for Computational Sciences (CCS), University of Tsukuba. We optimized the benchmark program for a (Knights Corner, KNC) system named "COMA (PACS-IX)" at CCS Tsukuba under the Intel Parallel Computing Center program. A single precision BiCGStab solver with the overlapped Restricted Additive Schwarz (RAS) preconditioner was implemented using SIMD intrinsics, OpenMP and MPI in the offload mode. With the reverse-offloading technique, we could reduce the communication and offloading overheads. We observed a performance of ∼ 200 GFlops sustained for the Wilson-Clover hopping matrix multiplication on the lattice sizes larger than 24^3× 32 on a sinlge card of the COMA system. A good weak scaling perofmace was observed on the local lattice sizes larger than 24^3× 32.
更多
查看译文
关键词
ccs qcd benchmark,knc
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要