Code alignment for architectures with pipeline group dispatching

Omer Boehm,Gadi Haber, Helena Kosachevsky

SYSTOR '10: Proceedings of the 3rd Annual Haifa Experimental Systems Conference(2010)

引用 2|浏览0
暂无评分
摘要
Today's architectures exploit long pipelines in order to increase instruction-level parallelism by grouping sets of consecutive instructions and feeding them into the pipeline with the purpose of executing them in a single cycle. The IBM Power architecture executes programs by dispatching groups of instructions where a dispatch group is fed as a whole into the pipeline to be executed in a single cycle. Such architecture, however, includes many cases of pipeline delays that result from dependencies between the resources of separate groups. As a result, there is a need to optimize the code in order to help the architecture place all the instructions in such a way that will produce as few delays as possible. Optimizing the alignment and the placement of the code is therefore, crucial to the performance of the program in such architectures. We show that in some cases, without proper code alignment, performance can degrade by 40% due to the impact of code alignment on the grouping and pipeline delays. We present a new binary-level and profile-based code alignment algorithm for architectures that make use of group dispatching. We show a steady performance gain of about 2--3% for fully optimized code running on IBM Power 6 while completely eliminating performance instability which can sometimes result in up to 40% variation in performance in the absence of the proposed code alignment algorithm. As the algorithm is based on gathered profiling and applies at binary-level, it can, therefore, be used as part of existing dynamic compilers and enabled on top of the operating system at runtime.
更多
查看译文
关键词
steady performance gain,proper code alignment,optimized code,pipeline delay,long pipeline,proposed code alignment algorithm,profile-based code alignment algorithm,performance instability,pipeline group,code alignment,single cycle,operating system,optimization,dynamic compilation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要