Introduction to the X10 Implementation of NPB MG

msra

引用 23|浏览5

暂无评分

摘要

X10 is a new Partitioned Global Address Space (PGAS) language being developed at IBM as part of the DARPA HPCS project. (Cray's Chapel and Sun's Fortress are the two other languages being developed in the DARPA project.) X10 is based on Java with extensions for large-scale and heterogeneous parallel programming. The fundamental distinction between X10 and other PGAS languages is that its model of parallelism is not Single Program Multiple Data (SPMD). X10 supports fine-grained parallelism (both data and task parallelism) so as to face the challenge posed by the explosion of hardware parallelism in the emerging computer architectures. The unit of computation in X10 is an activity (light-weighted threads) at a place - a place may be considered as a virtual shared memory multiprocessor (SMP). The number of activities in each place varies dynamically. Activities can spawn other activities locally or remotely. In this report, we explore the ways to fully express in X10 the logical parallelism in numerical algorithms through examples found in computational kernels such as Conjugate Gradient (CG), LU factorization (LU), and Multigrid (MG). Issues regarding how X10 handle the hierarchical and heterogeneous nature of the emerging large-scale computer platforms will be addressed in later studies. The NPB MG benchmark (1) uses a V-cycle Multigrid algorithm to solve Poisson's equation on a rectan- gular domain with periodic boundary condition. It involves applying a set of stencil operations sequentially on grids at each level of refinement. In the V cycle, the computation starts from the top (finest) refine- ment level, going down level by level toward the bottom then back up to the top. The stencil operations are restriction, prolongation, evaluation of residual, and point relaxation. They are implemented in class MGOP. The overall implementation includes classes MGDriver, MGOP, LevelData, and Util. In MGDriver, the test problems are set up and the solver defined in MGOP is called. The distributed array is implemented in LevelData. Constants and commonly used methods are defined in Util. The benchmark was first implemented in the usual way as programming a SPMD model, then more par- allelism that can be expressed explicitly in X10 is added, which by comparison is hard for SPMD languages to achieve. We use abstract performance metrics to quantify the parallelism we add. Here, we compute the length of critical execution path and the ideal speed up (the ratio of the total computation cost (communi- cation cost not included) over the length of critical path).

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要