Improving locality using loop and data transformations in an integrated frame work

msra(1998)

引用 38|浏览7
暂无评分
摘要
This paper presents a new integrated compiler framework for im- proving the cache performance of scientific applications. I n ad- dition to applying loop transformations, the method includes data layout optimizations, i.e., those that change the memory layouts of data structures (arrays in this case). A key characteristic of this ap- proach is that loop transformations are used to improve temporal locality while data layout optimizations are used to improve spatial locality. This optimization framework was used with sixteen loop nests from several benchmarks and math libraries, and the perfor- mance was measured using a cache simulator in addition to using a single node of the SGI Origin 2000 distributed-shared-memory machine for measuring actual execution times. The results demon- strate that this approach is very effective in improving loc ality and outperforms current solutions that use either loop or data t ransfor- mations alone. We expect that our solution will also enable better register usage due to increased temporal locality in the inn ermost loop, and that it will help in eliminating false-sharing on m ultipro- cessors due to exploiting spatial locality in the innermost loop. High performance computers of today extensively use multiple lev- els of memory hierarchies. This renders the performance of a p- plications critically dependent on their memory access character- istics. In particular, careful choice of memory-sensitive data lay- outs and code restructuring appear to be crucial. Unfortuna tely, the lack of automatic tools forces many users and in particul ar li- brary writers to manually restructure their code. The probl em is ex- acerbated by the increasingly sophisticated nature of appl ications. Manual restructuring requires a clear understanding of the impact of the machine architecture, is tedious and error-prone, an d results in severely reduced portability. In this paper we present an d eval- uate a compiler framework for improving the cache performance of scientific applications using a careful combination of lo op trans- formations and data layout optimizations. The kind of data layout optimizations considered here include memory layout changes such as row-major or column-major storage of multi-dimensional arrays (which are common data structures in regular scientific appl ica- tions). We will refer to data layout optimizations as data transfor- mations. Traditionally, loop transformations (4, 8, 14, 17, 21) have been the main techniques used to improve locality by changing the ac- cess pattern as a result of changing the order of execution of loop iterations. The effect of loop transformations is local, i. e., a loop transformation affects only the loop nest to which it is appl ied, and both temporal and spatial locality may improve as a result. But loop transformations are not always legal, and they affect a ll arrays in a loop nest some of them perhaps adversely. In a sense, loop transformations impact locality indirectly as a result of changing
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要