Extending the Scalability of Single Chip Stream Processors with On-chip Caches

msra(2008)

引用 23|浏览7
暂无评分
摘要
As semiconductor scaling continues, more transistors can be put onto the same chip despite growing challenges in clock frequency scaling. Stream processor architecture s can make effective use of these additional resources for ap- propriate applications. However, it is important that pro- grammer effort be amortized across future generations of stream processor architectures. Current industry projec- tions suggest a single chip may be able to integrate several thousand 64-bit floating-point ALUs within the next decade. Future designs will require significantly larger, scalableon- chip interconnection networks, which will likely increase memory access latency. While the capacity of the explicitly managed local store of current stream processor architec- tures could be enlarged to tolerate the added latency, exist - ing stream processing software may require significant pro- grammer effort to leverage such modifications. In this paper we propose a scalable stream processing architecture that addresses this issue. In our design, each stream processor has an explicitly managed local store model backed by an on-chip cache hierarchy. We evaluate our design using sev- eral parallel benchmarks to show the trade-offs of various cache and DRAM configurations. We show that addition of a 256KB L2 cache per memory controller increases the performance of our 16, 64 and 121 node stream proces- sors designs (containing 128, 896, and 1760 ALUs, respec- tively) by 14.5%, 54.9% and 82.3% on average respectively. We find that even those applications that utilize the local- store in our study benefit significantly from the addition of L2 caches.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要