Communication optimizations for global multi-threaded instruction scheduling

Architectural Support for Programming Languages and Operating Systems(2008)

引用 17|浏览18
暂无评分
摘要
The recent shift in the industry towards chip multiprocessor (CMP) designs has brought the need for multi-threaded applications to mainstream computing. As observed in several limit studies, most of the parallelization opportunities require looking for parallelism beyond local regions of code. To exploit these opportunities, es- pecially for sequential applications, researchers have recently pro- posed global multi-threaded instruction scheduling techniques, in- cluding DSWP (16) and GREMIO (15). These techniques simulta- neously schedule instructions from large regions of code, such as arbitrary loop nests or whole procedures, and have been shown to be effective at extracting threads for many applications. A key en- abler of these global instruction scheduling techniques is the Multi- Threaded Code Generation (MTCG) algorithm proposed in (16), which generates multi-threaded code for any partition of the in- structions into threads. This algorithm inserts communication and synchronization instructions in order to satisfy all inter-thread de- pendences. In this paper, we present a general compiler framework, COCO, to optimize the communication and synchronization instructions in- serted by the MTCG algorithm. This framework, based on thread- aware data-o w analyses and graph min-cut algorithms, appropri- ately models and optimizes all kinds of inter-thread dependences, including register, memory, and control dependences. Our exper- iments, using a fully automatic compiler implementation of these techniques, demonstrate signicant reductions (about 30% on aver- age) in the number of dynamic communication instructions in code parallelized with DSWP and GREMIO. This reduction in commu- nication translates to performance gains of up to 40%.
更多
查看译文
关键词
global multi-threaded instruction scheduling,multi-threaded code,mtcg algorithm,dynamic communication instruction,communication translates,inter-thread dependence,synchronization instruction,multi-threaded application,communication optimizations,graph min-cut algorithm,algorithm inserts communication,communication,multi threading,instruction scheduling,data flow analysis,synchronization,satisfiability,code generation,data flow
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要