Efficient lock-step synchronization in task-parallel languages.

SOFTWARE-PRACTICE & EXPERIENCE（2019）

引用 1|浏览12

暂无评分

摘要

Many modern task-parallel languages allow the programmer to synchronize tasks using high-level constructs like barriers, clocks, and phasers. While these high-level synchronization primitives help the programmer express the program logic in a convenient manner, they also have their associated overheads. In this paper, we identify the sources of some of these overheads for task-parallel languages like X10 that support lock-step synchronization, and propose a mechanism to reduce these overheads. We first propose three desirable properties that an efficient runtime (for task-parallel languages like X10, HJ, Chapel, and so on) should satisfy, to minimize the overheads during lock-step synchronization. We use these properties to derive a scheme to called uClocks to improve the efficiency of X10 clocks; uClocks consists of an extension to X10 clocks and two related runtime optimizations. We prove that uClocks satisfies the proposed desirable properties. We have implemented uClocks for the X10 language+runtime and show that the resulting system leads to a geometric mean speedup of 5.36x on a 16-core Intel system and 11.39x on a 64-core AMD system, for benchmarks with a significant number of synchronization operations.

查看译文

关键词

lock-step synchronization,task-parallel languages,runtime optimizations

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要