Evaluating Thread Coarsening And Low-Cost Synchronization On Intel Xeon Phi

2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020(2020)

引用 3|浏览11
暂无评分
摘要
Manycore processors such as GPUs and Intel Xeon Phis have become popular due to their massive parallelism and high power-efficiency. To achieve optimal performance, it is necessary to optimize the use of the compute cores and of the memory system available on these devices. Previous work has proposed techniques to improve the use of the GPU resources. While Intel Phi can provide massive parallelism through their x86 cores and vector units, optimization techniques for these platforms have received less consideration.In this work, we study the benefits of thread coarsening and low-cost synchronization on applications running on Intel Xeon Phi processors and encoded in SIMT fashion. Specifically, we explore thread coarsening as a way to remap the work to the available cores and vector lanes. In addition, we propose low- overhead synchronization primitives, such as atomic operations and barriers, which transparently apply to threads mapped to the same or different VPUs and x86 cores. Finally, we consider the combined use of thread coarsening and our proposed synchronization primitives. We evaluate the effect of these techniques on the performance of two kinds of kernels: collaborative and non-collaborative ones, the former using scratchpad memory to explicitly control data sharing among threads. Our evaluation leads to the following results. First, while not always beneficial for non-collaborative kernels, thread coarsening improves the performance of collaborative kernels consistently by reducing the synchronization overhead. Second, our synchronization primitives outperform standard pthread APIs by a factor up to 8x in real-world benchmarks. Last, the combined use of the proposed techniques leads to performance improvements, especially for collaborative kernels.
更多
查看译文
关键词
SIMT, manycore processors, Intel Xeon Phi, thread coarsening, synchronization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要