Exploiting Adaptive Data Compression To Improve Performance And Energy-Efficiency Of Compute Workloads In Multi-Gpu Systems

Mohammad Khavari Tavana, Yifan Sun, Nicolas Bohm Agostini, David R. Kaeli

2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019)（2019）

引用 17|浏览33

暂无评分

摘要

Graphics Processing Unit (GPU) performance has relied heavily on our ability to scale of number of transistors on chip, in order to satisfy the ever-increasing demands for more computation. However, transistor scaling has become extremely challenging, limiting the number of transistors that can be crammed onto a single die. Manufacturing large, fast and energy-efficient monolithic GPUs, while growing the number of stream processing units on-chip, is no longer a viable solution to scale performance.GPU vendors are aiming to exploit multi-GPU solutions, interconnecting multiple GPUs in the single node with a high bandwidth network (such as NVLink), or exploiting Multi-Chip Module (MCM) packaging, where multiple GPU modules are integrated in a single package. The inter-GPU bandwidth is an expensive and critical resource for designing multi-GPU systems. The design of the inter-GPU network can impact performance significantly. To address this challenge, in this paper we explore the potential of hardware-based memory compression algorithms to save bandwidth and improve energy efficiency in multi-GPU systems. Specifically, we propose an adaptive inter-GPU data compression scheme to efficiently improve both performance and energy efficiency. Our evaluation shows that the proposed optimization on multi-GPU architectures can reduce the inter-GPU traffic up to 62%, improve system performance by up to 33%, and save energy spent powering the communication fabric by 45%, on average.

查看译文

关键词

Multi-GPU system, Multi-Chip-Module, Compression algorithms, Bandwidth management, Performance

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要