Cacheline Utilization-Aware Link Traffic Compression for Modular GPUs

2020 33rd International Conference on VLSI Design and 2020 19th International Conference on Embedded Systems (VLSID)(2020)

引用 0|浏览37
暂无评分
摘要
Modular GPU systems are increasingly gaining attention for continued scaling of system sizes under packaging and fabrication challenges. In such systems, limited interconnect bandwidth can be a concern for system performance and energy efficiency. Compression has been widely considered as an effective means to reduce the amount of data moved. However, most prior related schemes are limited to exploiting the cacheline data values for achieving high compression ratios.We propose CUALiT: Cacheline Utilization-Aware Link Traffic compression to reduce I/O link traffic for modular GPU systems. Our approach exploits the variation in temporal and spatial utilization of individual cacheline words to achieve higher compression ratios. We utilize a novel mechanism to predict utilization of cachelines across warps at word granularity. Predicted unutilized words are dropped from responses. Furthermore, latency-critical words are compressed using traditional methods while words with temporal slack are coalesced across cachelines and compressed lazily to achieve higher compression ratios. CUALiT reduces offchip link traffic by 14% on average while achieving up to 25% lower system energy and an average 11% (up to 2x) higher performance over a compression only scheme.
更多
查看译文
关键词
cacheline utilization-aware link traffic compression,lower system energy,offchip link traffic,higher compression ratios,individual cacheline words,spatial utilization,temporal utilization,high compression ratios,cacheline data values,system performance,modular GPU systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要