Towards a Single-Host Many-GPU System

2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)(2018)

引用 1|浏览79
暂无评分
摘要
As computation-intensive tasks such as deep learning and big data analysis take advantage of GPU based accelerators, the interconnection links may become a bottleneck. In this paper, we investigate the upcoming performance bottleneck of multi-accelerator systems, as the number of accelerators equipped with single host grows. We instrumented the host PCIe fabric to measure the data transfer and compared it with the measurements from the software tool. It shows how the data transfer (P2P) helps to avoid the bottleneck on the interconnection links, but multi-GPU performance does not scale up as expected due to the control messages. We quantify the impact of host control messages with suggestions to remedy scalability bottlenecks. We also implement the proposed strategy on Lulesh to validate the concept. The result shows our strategy can save 59.86% time cost of the kernel and 13.32% PCIe H2D payload.
更多
查看译文
关键词
Measurement,Peer-to-peer,GPU,Deep learning,PCIe,NVLINK
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要