An adaptive self-scheduling loop scheduler

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE(2022)

引用 2|浏览2
暂无评分
摘要
Many shared-memory parallel irregular applications, such as sparse linear algebra and graph algorithms, depend on efficient loop scheduling (LS) in a fork-join manner despite that the work per loop iteration can greatly vary depending on the application and the input. Because of the importance of LS, many different methods (e.g., workload-aware self-scheduling) and parameters (e.g., chunk size) have been explored to achieve reasonable performance, and many of these methods require expert prior knowledge about the application and input before runtime. This work proposes a new LS method that requires little to no expert knowledge to achieve speedups close to those of tuned LS methods by self-managing chunk size based on a heuristic of throughput and using work-stealing to recover from workload imbalances. This method, named iCh, is implemented into libgomp for testing. It is evaluated against OpenMP's guided, dynamic, and taskloop methods and is evaluated against BinLPT and generic work-stealing on an array of applications that includes: a synthetic benchmark, breadth-first search, K-Means, the molecular dynamics code LavaMD, and sparse matrix-vector multiplication. On a 28 thread Intel system, iCh is the only method to always be one of the top three LS methods. On average across all applications, iCh is within 5.4% of the best method and is even able to outperform other LS methods for breadth-first search and K-Means.
更多
查看译文
关键词
irregular applications, loop scheduling, OpenMP, performance evaluation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要