Auto-Tuning of Data Communication on Heterogeneous Systems

Embedded Multicore Socs(2013)

引用 1|浏览2
暂无评分
摘要
Heterogeneous systems formed by traditional CPUs and compute accelerators, such as GPUs, are becoming widely used to build modern supercomputers. However, many different system topologies (i.e., how CPUs, accelerators, and I/O devices are interconnected) are being deployed. Each system organization presents different trade-offs when transferring data between CPUs, accelerators, and nodes within a cluster, requiring different software implementations to achieve optimal data communication bandwidth. In this paper we explore the potential impact of two optimizations to achieve optimal data transfer bandwidth: topology-aware process placement policies, and double-buffering. We design a set of experiments to evaluate all possible alternatives, and run each of them on different hardware configurations. We show that optimal data transfer mechanisms depend on both the hardware topology and the application dataset size. Our experimental evaluation shows that auto-tuning applications to match the hardware topology, and to find the best double-buffering configuration can improve the data transfers bandwidth up to 70% for local communication and is key to achieve optimal bandwidth in remote communication for data transfers larger than 128KB.
更多
查看译文
关键词
heterogeneous systems,different software implementation,different hardware configuration,different system topology,different trade-offs,optimal data transfer bandwidth,data communication,optimal bandwidth,hardware topology,optimal data transfer mechanism,optimal data communication bandwidth,data transfer,topology,parallel processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要