Efficient lock-step synchronization in task-parallel languages.

Akshay Utture,V. Krishna Nandivada

SOFTWARE-PRACTICE & EXPERIENCE(2019)

引用 1|浏览12
暂无评分
摘要
Many modern task-parallel languages allow the programmer to synchronize tasks using high-level constructs like barriers, clocks, and phasers. While these high-level synchronization primitives help the programmer express the program logic in a convenient manner, they also have their associated overheads. In this paper, we identify the sources of some of these overheads for task-parallel languages like X10 that support lock-step synchronization, and propose a mechanism to reduce these overheads. We first propose three desirable properties that an efficient runtime (for task-parallel languages like X10, HJ, Chapel, and so on) should satisfy, to minimize the overheads during lock-step synchronization. We use these properties to derive a scheme to called uClocks to improve the efficiency of X10 clocks; uClocks consists of an extension to X10 clocks and two related runtime optimizations. We prove that uClocks satisfies the proposed desirable properties. We have implemented uClocks for the X10 language+runtime and show that the resulting system leads to a geometric mean speedup of 5.36x on a 16-core Intel system and 11.39x on a 64-core AMD system, for benchmarks with a significant number of synchronization operations.
更多
查看译文
关键词
lock-step synchronization,task-parallel languages,runtime optimizations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要