Architecture optimizations for synchronization and communication on chip multiprocessors

Miami, FL（2008）

引用 7|浏览2

暂无评分

摘要

Chip multiprocessors (CMPs) enable concurrent execution of multiple threads using several cores on a die. Current CMPs behave much like symmetric multiprocessors and do not take advantage of the proximity between cores to improve synchronization and communication between concurrent threads. Thread synchronization and communication instead use memory/cache interactions. We propose two architectural enhancements to support fine grain synchronization and communication between threads that reduce overhead and memory/cache contention. Register-based synchronization exploits the proximity between cores to provide low-latency shared registers for synchronization. This approach can save significant power over spin waiting when blocking events that suspend the core are used. Pre-pushing provides software controlled data forwarding between caches to reduce coherence traffic and improve cache latency and hit rates. We explore the behavior of these approaches, and evaluate their effectiveness at improving synchronization and communication performance on CMPs with private caches. Our simulation results show significant reduction in inter-core traffic, latencies, and miss rates.

查看译文

关键词

cache storage,microprocessor chips,multi-threading,synchronisation,architecture optimizations,chip multiprocessors,communication performance,concurrent execution,low-latency shared registers,memory-cache interactions,multithreading,register-based synchronization,software controlled data forwarding,synchronization performance

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要