Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors

Xinmin Tian,Hideki Saito,Serguei V. Preis,Eric N. Garcia,Sergey S. Kozhukhov,Matt Masten,Aleksei G. Cherkasov,Nikolay Panchenko

Parallel and Distributed Processing Symposium Workshops & PhD Forum（2013）

Cited 63|Views8

No score

Abstract

Intel® Xeon Phi coprocessor is based on the Intel® Many Integrated Core (Intel® MIC) architecture, which is an innovative new processor architecture that combines abundant thread parallelism with long SIMD vector units. Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel® Xeon Phi coprocessors. In this paper, we present several practical SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel® MIC specific alignment optimization, and small matrix transpose/multiplication 2-D vectorization implemented in the Intel® C/C++ and Fortran production compilers for Intel® Xeon Phi coprocessors. A set of workloads from several application domains is employed to conduct the performance study of our SIMD vectorization techniques. The performance results show that we achieved up to 12.5x performance gain on the Intel® Xeon Phi coprocessor.

Translated text

Key words

high performance,optimisation,intel many integrated core architecture,simd vector unit,fortran,compiler optimization,performance gain,performance study,parallel architectures,xeon phi,less-than-full-vector loop vectorization,intel® xeon phi coprocessor,intel® mic architecture,intel xeon phi coprocessors,2-d vectorization,performance result,small matrix transpose-multiplication 2-d vectorization,long simd vector unit,coprocessors,innovative new processor architecture,simd vectorization technique,fortran production compilers,practical simd vectorization techniques,simd vectorization,simd vectorization techniques,vectors,registers,computer architecture,parallel processing,optimization

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined