Dynamic Scheduling with Narrow Operand Values

msra(2005)

引用 23|浏览6
暂无评分
摘要
Abstract Tomasulo’s algorithm creates a dynamic ,execution order that extracts a high ,degree of instruction-level parallelism from a sequential program. Modern processors create this schedule early in the pipeline, before operand values have been computed, since present-day cycle-time demands preclude inclusion of a full ALU and bypass network delay in the instruction scheduling loop. Hence, modern schedulers must predict the latency of load instructions, since load latency cannot be determined ,within the scheduling pipeline. Whenever load latency is mispredicted due to an unanticipated cache miss or store alias, a significant amount of power is wasted due to incorrectly issued dependent instructions that are already traversing the execution pipeline. This paper exploits the prevalence of narrow operand values (i.e. ones with fewer signficant bits) to solve this problem, by placing a fast, narrow ALU and datapath within the scheduling loop. Virtually all load latency mispredictions can be accurately anticipated with this narrow data path, and little power is wasted on executing incorrectly scheduled instructions. We show that such a narrow data-path design, coupled with a novel partitioned store queue and pipelined data cache, can achieve a cycle time comparable to conventional approaches, while dramatically reducing misspeculation, saving power, and improving per-cycle performance. Finally, we show that due to the rarity of misspeculation in our architecture, a less-complex flush-based recovery scheme suffices for high performance. Keywords: scheduler, issue-queue, partial operands, microarchitecture. 1,Introduction and Motivation Over the last two decades, microprocessors have evolved from relatively straightforward,pipelined, largely non-speculative implementations ,to deeply ,pipelined machines ,with out-of-order execution and a high ,degree of speculation to maximize,performance,benefit. One technique that is commonly,implemented,in current generation designs is load latency speculation. In this technique, the scheduler speculates on load latency by assuming no store
更多
查看译文
关键词
out of order execution,network delay,cycle time,instruction scheduling,dynamic scheduling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要