Exploiting Adaptive Data Compression To Improve Performance And Energy-Efficiency Of Compute Workloads In Multi-Gpu Systems
2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019)(2019)
摘要
Graphics Processing Unit (GPU) performance has relied heavily on our ability to scale of number of transistors on chip, in order to satisfy the ever-increasing demands for more computation. However, transistor scaling has become extremely challenging, limiting the number of transistors that can be crammed onto a single die. Manufacturing large, fast and energy-efficient monolithic GPUs, while growing the number of stream processing units on-chip, is no longer a viable solution to scale performance.GPU vendors are aiming to exploit multi-GPU solutions, interconnecting multiple GPUs in the single node with a high bandwidth network (such as NVLink), or exploiting Multi-Chip Module (MCM) packaging, where multiple GPU modules are integrated in a single package. The inter-GPU bandwidth is an expensive and critical resource for designing multi-GPU systems. The design of the inter-GPU network can impact performance significantly. To address this challenge, in this paper we explore the potential of hardware-based memory compression algorithms to save bandwidth and improve energy efficiency in multi-GPU systems. Specifically, we propose an adaptive inter-GPU data compression scheme to efficiently improve both performance and energy efficiency. Our evaluation shows that the proposed optimization on multi-GPU architectures can reduce the inter-GPU traffic up to 62%, improve system performance by up to 33%, and save energy spent powering the communication fabric by 45%, on average.
更多查看译文
关键词
Multi-GPU system, Multi-Chip-Module, Compression algorithms, Bandwidth management, Performance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要