Optimizing Matrix Multiplication on Intel® Xeon Phi TH x200 Architecture
2017 IEEE 24th Symposium on Computer Arithmetic (ARITH)(2017)
摘要
Matrix multiplication is ubiquitous in scientific computing. From computational science to machine learning, a large and diverse set of applications rely on the performance of general matrix-matrix multiplication (GEMM) subroutines. The Intel
®
Math Kernel Library(R) provides highly optimized GEMM subroutines that take full advantage of the available parallelism and vectorization in both Intel
®
Xeon
®
and Intel
®
Xeon Phi(TM) processors. In this paper we discuss the optimization of GEMM subroutines for the Intel
®
Xeon Phi
TM
x200 (code-named Knights Landing).
更多查看译文
关键词
matrix multiplication,intel xeon phi,blas,performance optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要