Understanding the Data Traffic of Uncore in Westmere NUMA Architecture

Parallel, Distributed and Network-Based Processing(2014)

引用 2|浏览1
暂无评分
摘要
Non-Uniform Memory Access (NUMA) has become the main stream architecture of modern servers. In processors, Uncore part plays a very important role, especially in NUMA systems, because it is used to connect Cores, Last Level Caches (LLC), on-chip multiple Memory Controllers (MCs) and highspeed interconnections. Recent study shows that Uncore congestion plays a more important role than locality. It needs more understanding of Uncore behavior to alleviate the congestion and efficiently utilize certain architecture. Our work focuses on the unbalance and congestion of data traffic happened on processor's Uncore part. We choose an Intel NUMA architecture named "Westmere" and use hardware performance counters to investigate several benchmarks' data flow in Uncore. In our experiments we find that data unbalance of Global Queue (GQ) and QuickPath Home Logical (QHL)'s trackers is really serious, the biggest unbalance rate is more than 1000 times, new dynamic entries management algorithm is needed to improve entries' usage the congestion of GQ and QHL's trackers has different behaviors with threads number increases and also for a given memory access pattern the congestion of GQ and QHL's trackers grows linearly with the problem size increases.
更多
查看译文
关键词
MEMORY MANAGEMENT,MULTIPROCESSORS,PLACEMENT,SYSTEMS
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要