Reaping the Benefit of Temporal Silence to Improve Communication Performance

Austin, TX(2005)

引用 1|浏览0
暂无评分
摘要
Communication misses--those serviced by dirty data in remote caches--are a pressing performance limiter in shared-memory multiprocessors. Recent research has indicated that temporally silent stores can be exploited to substantially reduce such misses, either with coherence protocol enhancements (MESTI); by employing specu- lation to create atomic silent store-pairs that achieve speculative lock elision (SLE); or by employing load value prediction (LVP). We evaluate all three approaches utilizing full-system, execution-driven sim- ulation, with scientific and commercial workloads, to measure performance. Our studies indicate that accu- rate detection of elision idioms for SLE is vitally impor- tant for delivering robust performance and appears difficult for existing commercial codes. Furthermore, common datapath issues in out-of-order cores cause bar- riers to speculation and therefore may cause SLE fail- ures unless SLE-specific speculation mechanisms are added to the microarchitecture. We also propose novel prediction and silence detection mechanisms that enable the MESTI protocol to deliver robust performance for all workloads. Finally, we conduct a detailed execution- driven performance evaluation of load value prediction (LVP), another simple method for capturing the benefit of temporally silent stores. We show that while theoret- ically LVP can capture the greatest fraction of commu- nication misses among all approaches, it is usually not the most effective at delivering performance. This occurs because attempting to hide latency by speculating at the consumer, i.e. predicting load values, is fundamentally less effective than eliminating the latency at the source, by removing the invalidation effect of stores. Applying each method, we observe performance changes in appli- cation benchmarks ranging from 1% to 14% for an enhanced version of MESTI, -1.0% to 9% for LVP, -3% to 9% for enhanced SLE, and 2% to 21% for combined techniques.
更多
查看译文
关键词
temporal silence,improve communication performance,microarchitecture,coherence,out of order,pressing,protocols,benchmark testing,frequency,robustness,parallel programming
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要