Design of a Multithreaded Barnes-Hut Algorithm for Multicore Clusters

Parallel and Distributed Systems, IEEE Transactions  (2015)

引用 13|浏览5
暂无评分
摘要
We describe in this paper an implementation of the Barnes-Hut algorithm on multicore clusters. Based on a partitioned global address space (PGAS) library, the design integrates intranode multithreading and internode one-sided communication, exemplifying a PGAS + X programming style. Within a node, the computation is decomposed into tasks (subtasks) and multitasking is used to hide network latency. We study the tradeoffs between locality in private caches and locality in shared caches and bring the insights into the design. As a result, our implementation consumes less memory per core, invokes less internode communication, and enjoys better load-balancing strategies. The final code achieves up to 41% performance improvement over a non-multithreaded counterpart. Through detailed comparison, we also show its advantages over other well-known Barnes-Hut implementations, both in programming complexity and in performance.
更多
查看译文
关键词
partitioned global address space,algorithm design and analysis,multicore processing,multi threading,pgas,programming,synchronization,n body,barnes hut,force,cluster,multicore,instruction sets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要