A Work Stealing Scheduler For Parallel Loops On Shared Cache Multicores

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing(2011)

引用 14|浏览0
暂无评分
摘要
Reordering instructions and data layout can bring significant performance improvement for memory bounded applications. Parallelizing such applications requires a careful design of the algorithm in order to keep the locality of the sequential execution. In this paper, we aim at finding a good parallelization of memory bounded applications on multicore that preserves the advantage of a shared cache. We focus on sequential applications with iteration through a sequence of memory references. Our solution relies on a work stealing scheduler combined with a dynamic sliding window that constrains cores sharing the same cache to process data close in memory. This parallel algorithm induces the same number of cache misses as the sequential algorithm at the expense of an increased number of synchronizations. Experiments with a memory bounded application confirm that core collaboration for shared cache access can bring significant performance improvements despite the incurred synchronization costs.
更多
查看译文
关键词
memory bounded application,significant performance improvement,memory reference,shared cache,shared cache access,parallel algorithm,sequential algorithm,sequential application,sequential execution,data layout,parallel loop,shared cache multicores
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要