ICLA Unit: Intra-Cluster Locality-Aware Unit to Reduce L-2 Access and NoC Pressure in GPGPUs

Siamak Biglari Ardabili,Gholamreza Zare Fatin

JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS（2022）

引用 0|浏览0

暂无评分

摘要

As the number of streaming multiprocessors (SMs) in GPUs increases, in order to gain better performance, the reply network faces heavy traffic. This causes congestion on Network-on-Chip (NoC) routers and memory controller's (MC) buffers. By taking advantage of cooperative thread arrays (CTAs) that are scheduled locally in clusters, there is a high probability of finding the same copy of data in other SM's L-1 cache in the same cluster. In order to make this feasible, it is necessary for the SMs to have access to local L-1 cache of the neighboring SMs. There is a considerable congestion in NoC due to unique traffic pattern called many-to-few-to-many. Thanks to the reduced number of requests that is attained by our proposed Intra-Cluster Locality-Aware (ICLA) unit, this congested replying network traffic becomes many-to-many traffic pattern and the replied data goes through the less-utilized core-to-core communication that mitigates the NoC traffic. The proposed architecture in this paper has been evaluated using 15 different workloads from CUDA SDK, Rodinia, and ISPASS2009 benchmarks. The proposed ICLA unit has been modeled and simulated in the GPGPU-Sim. The results show about 23.79% (up to 49.82%) reduction in average network latency, 15.49% (up to 36.82%) reduction in average L-2 cache access, and 18.18% (up to 58.1%) average improvement in the instruction per cycle (IPC).

查看译文

关键词

GPGPU, NoC, memory controller, GPGPU-Sim, instruction per cycle, streaming multiprocessors

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要