Using General-Purpose Processor Cores As Prefetching Engines in Chip Multiprocessor Architectures

Martin Burtscher,Ilya Konstantinovich Ganusov

Using general-purpose processor cores as prefetching engines in chip multiprocessor architectures（2007）

引用 23|浏览8

暂无评分

摘要

Scaling the performance of applications with little thread-level parallelism is one of the most serious impediments to the success of multi-core architectures. At the same time, the long latency of memory accesses represents one of the largest performance bottlenecks for individual program threads. As a result, a typical microprocessor spends a significant amount of time waiting for data to be delivered from memory instead of performing useful computation.Fortunately, it is often possible to guess which memory data will be needed by a program thread in the near future. Various hardware and software prefetching techniques have been developed to fetch critical data before they are requested by the processor. This way prefetching can eliminate processor stalls otherwise induced by the slow response from the memory system.The main contribution of this dissertation is the development of two techniques that utilize extra cores of a chip multiprocessor (CMP) as prefetching engines to increase the performance of single program threads. The proposed approaches effectively leverage the execution capabilities of chip multiprocessors to compute data addresses that are likely to miss in the cache and prefetch them ahead of program thread load requests.I demonstrate the effectiveness of the proposed approaches by performing cycle-accurate simulations of a chip multiprocessor consisting of two four-way superscalar cores running the single-threaded SPEC CPU2000 benchmark suite. The proposed mechanisms provide significant performance improvements over a baseline that already includes an aggressive hardware stream prefetcher. A comparison with other multi-core prefetching mechanisms from the literature shows that the techniques proposed in this dissertation provide competitive performance, incur less energy overhead, and require considerably simpler hardware support.

查看译文

关键词

proposed approach,chip multiprocessor,competitive performance,largest performance bottleneck,significant performance improvement,critical data,data address,individual program thread,memory access,memory data,chip multiprocessor architecture,general-purpose processor core,prefetching engine

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要