AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
The time taken by our parallel graph partitioning algorithm is only slightly longer than the time taken for rearrangement of the graph among processors according to the new partition

Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs.

SIAM Review, no. 2 (1999): 278-300

被引用651|浏览20
EI
下载 PDF 全文
引用
微博一下

摘要

In this paper we present a parallel formulation of a multilevel k-way graph partitioning algorithm. A key feature of this parallel formulation is that it is able to achieve a high degree of concurrency while maintaining the high quality of the partitions produced by the serial multilevel k-way partitioning algorithm. In particular, the ti...更多

代码

数据

简介
  • Graph partitioning is an important problem that has extensive applications in many areas, including scientific computing, very large scale integration (VLSI) design, geographical information systems, operations research, and task scheduling.
  • A number of researchers have investigated a class of algorithms in which the original graph is successively coarsened down until it has only a small number of vertices, a partitioning of this coarsened graph is computed, and this initial partitioning is successively refined using a Kernighan–Lin (KL) type heuristic as it is being projected back to the original graph
  • This multilevel paradigm was studied independently by Bui and Jones [4] in the context of computing fill-reducing matrix reordering, by Hendrickson and Leland [13] in the context of finite element grid partitioning, and by Hauck and Borriello [11] and Cong and Smith [5] for hypergraph partitioning.
  • Multilevel k-way partitioning techniques are generally faster and provide better quality solutions than multilevel recursive bisection schemes [18]
重点内容
  • Graph partitioning is an important problem that has extensive applications in many areas, including scientific computing, very large scale integration (VLSI) design, geographical information systems, operations research, and task scheduling
  • We evaluated the performance of our parallel multilevel k-way graph partitioning algorithm on a wide range of graphs arising in different application domains
  • In this paper we presented a parallel formulation of the multilevel k-way partitioning algorithm
  • We show that the algorithm is scalable for a class of graphs that includes the commonly used finite element meshes
  • The time taken by our parallel graph partitioning algorithm is only slightly longer than the time taken for rearrangement of the graph among processors according to the new partition
  • We know from Table 6.3 that, for AUTO, going from a 16-way to a 128-way partition, the run time increases from 48.49 seconds to 54.61 seconds, a 12.6% increase
  • Experiments with a variety of finite element graphs show that our parallel formulation produces high-quality partitioning in a short amount of time
结果
  • The authors evaluated the performance of the parallel multilevel k-way graph partitioning algorithm on a wide range of graphs arising in different application domains.
  • The authors implemented the parallel multilevel algorithm on a 128-processor Cray T3D parallel computer.
  • Each processor on the T3D is a 150 MHz Dec Alpha (EV4).
结论
  • In this paper the authors presented a parallel formulation of the multilevel k-way partitioning algorithm.
  • Even though the parallel formulation is implemented for non-cache-coherent shared-address space architectures such as the Cray T3D, the formulation can be adapted for message passing architectures such as the IBM SP2.
  • On such architectures, all interactions are done via message passing that has much larger startup latency than that of the one-sided communication operations on architectures such as the Cray T3D.
  • Due to this worse ratio of computation and communication, comparable efficiency will be obtained only for proportionately larger graphs
总结
  • Introduction:

    Graph partitioning is an important problem that has extensive applications in many areas, including scientific computing, very large scale integration (VLSI) design, geographical information systems, operations research, and task scheduling.
  • A number of researchers have investigated a class of algorithms in which the original graph is successively coarsened down until it has only a small number of vertices, a partitioning of this coarsened graph is computed, and this initial partitioning is successively refined using a Kernighan–Lin (KL) type heuristic as it is being projected back to the original graph
  • This multilevel paradigm was studied independently by Bui and Jones [4] in the context of computing fill-reducing matrix reordering, by Hendrickson and Leland [13] in the context of finite element grid partitioning, and by Hauck and Borriello [11] and Cong and Smith [5] for hypergraph partitioning.
  • Multilevel k-way partitioning techniques are generally faster and provide better quality solutions than multilevel recursive bisection schemes [18]
  • Results:

    The authors evaluated the performance of the parallel multilevel k-way graph partitioning algorithm on a wide range of graphs arising in different application domains.
  • The authors implemented the parallel multilevel algorithm on a 128-processor Cray T3D parallel computer.
  • Each processor on the T3D is a 150 MHz Dec Alpha (EV4).
  • Conclusion:

    In this paper the authors presented a parallel formulation of the multilevel k-way partitioning algorithm.
  • Even though the parallel formulation is implemented for non-cache-coherent shared-address space architectures such as the Cray T3D, the formulation can be adapted for message passing architectures such as the IBM SP2.
  • On such architectures, all interactions are done via message passing that has much larger startup latency than that of the one-sided communication operations on architectures such as the Cray T3D.
  • Due to this worse ratio of computation and communication, comparable efficiency will be obtained only for proportionately larger graphs
表格
  • Table1: Various graphs used in evaluating the parallel multilevel k-way graph partitioning algorithm
  • Table2: The performance of the parallel multilevel k-way partitioning algorithm on the Cray T3D. For each graph, the performance is shown for 16-, 32-, 64-, and 128-way partitions on 16, 32, 64, and 128 processors, respectively. The times are in seconds
  • Table3: The performance of the serial multilevel k-way partitioning algorithm. For each graph, the performance is shown for 16-, 32-, 64-, and 128-way partitions. The times are in seconds on an SGI Challenge workstation
  • Table4: The amount of time (in seconds) required by the different phases of the parallel partitioning algorithm for some graphs, on 16 and 128 processors
  • Table5: The amount of time (in seconds) required by the different phases of the parallel partitioning algorithm for different initial vertex distributions, on 16 and 128 processors. The columns labeled “Rand” correspond to a random distribution of the graph, whereas the columns labeled “PrePart” correspond to a prepartitioned distribution of the graph
Download tables as Excel
相关工作
  • Out of the three phases of the multilevel k-way partitioning algorithm described in section 2, the coarsening and the uncoarsening phases require the bulk of the computation (over 95%). Hence, it is critical for any efficient parallel formulation of the multilevel k-way partitioning algorithm to successfully parallelize these two phases. In the following, we review the difficulties encountered in parallelizing these phases, and previous related works.

    Coarsening. Recall that, during the coarsening phase (section 2.1), a matching of the edges is computed, and this is used to contract the graph. One possible way of computing the matching in parallel is to have each processor only compute matchings between the vertices that it stores locally, and to use these local matchings to contract the graph. Since each pair of matched vertices resides on the same processor, this approach requires no communication during the contraction step. This approach works well as long as each processor stores relatively well connected portions of the entire graph. In particular, if the graph is distributed among the processors in a partitioned fashion, then this approach works extremely well. This is not a realistic assumption in many cases, since a good partitioning of the graph is what we are trying to compute by the multilevel k-way partitioner. Nevertheless, this approach of local matchings can work reasonably well when the number of processors used is small relative to the size of the graph and the average degree of the graph is relatively high.
基金
  • This work was supported by NSF CCR-9423082, by Army Research Office contract DA/DAAH04-95-1-0538, and by the Army High Performance Computing Research Center (AHPCRC) under the auspices of the Department of the Army, Army Research Laboratory cooperative agreement DAAH04-95-2-0003/contract DAAH04-95-C-0008
引用论文
  • S. T. Barnard, PMRSB: Parallel multilevel recursive spectral bisection, in Supercomputing 1995, ACM and IEEE Computer Society, San Diego, CA, 1995.
    Google ScholarLocate open access versionFindings
  • S. T. Barnard and H. Simon, A parallel implementation of multilevel recursive spectral bisection for application to adaptive unstructured meshes, in Proc. Seventh SIAM Conf. on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1995, pp. 627–632.
    Google ScholarLocate open access versionFindings
  • S. T. Barnard and H. D. Simon, A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems, in Proc. Sixth SIAM Conf. on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1993, pp. 711–718.
    Google ScholarLocate open access versionFindings
  • T. N. Bui and C. Jones, A heuristic for reducing fill-in in sparse matrix factorization, in Proc. Sixth SIAM Conf. on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1993, pp. 445–452.
    Google ScholarLocate open access versionFindings
  • J. Cong and M. L. Smith, A parallel bottom-up clustering algorithm with applications to circuit partitioning in VLSI design, in Proc. ACM/IEEE Design Automation Conference, Dallas, TX, 1993, pp. 755–760.
    Google ScholarLocate open access versionFindings
  • K. D. Devine and J. E. Flaherty, Dynamic load balancing for parallel finite element methods with adaptive h- and p-refinement, in Proc. Seventh SIAM Conf. on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1995, pp. 593–598.
    Google ScholarLocate open access versionFindings
  • P. Diniz, S. Plimpton, B. Hendrickson, and R. Leland, Parallel algorithms for dynamically partitioning unstructured grids, in Proc. Seventh SIAM Conf. on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1995, pp. 615–620.
    Google ScholarLocate open access versionFindings
  • A. George, Nested dissection of a regular finite element mesh, SIAM J. Numer. Anal., 10 (1973), pp. 345–363.
    Google ScholarLocate open access versionFindings
  • J. R. Gilbert and E. Zmijewski, A parallel graph partitioning algorithm for a message-passing multiprocessor, Internat. J. Parallel Programming, (1987), pp. 498–513.
    Google ScholarLocate open access versionFindings
  • A. Gupta, G. Karypis, and V. Kumar, Highly scalable parallel algorithms for sparse matrix factorization, IEEE Trans. Parallel and Distributed Systems, 8 (1997), pp. 502–520; also available online from http://www.cs.umn.edu/̃karypis.
    Locate open access versionFindings
  • S. Hauck and G. Borriello, An evaluation of bipartitioning technique, in Proc. Chapel Hill Conf. on Advanced Research in VLSI, IEEE Computer Society, San Diego, CA, 1995.
    Google ScholarLocate open access versionFindings
  • M. T. Heath and P. Raghavan, A Cartesian parallel nested dissection algorithm, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 235–253.
    Google ScholarLocate open access versionFindings
  • B. Hendrickson and R. Leland, A Multilevel Algorithm for Partitioning Graphs, Tech. Rep. SAND93-1301, Sandia National Laboratories, Albuquerque, NM, 1993.
    Google ScholarFindings
  • Z. Johan, K. K. Mathur, S. L. Johnsson, and T. J. R. Hughes, Finite Element Methods on the Connection Machine CM-5 System, Tech. Rep., Thinking Machines Corporation, Burlington, MA, 1993.
    Google ScholarFindings
  • M. T. Jones and P. E. Plassmann, A parallel graph coloring heuristic, SIAM J. Sci. Comput., 14 (1993), pp. 654–669.
    Google ScholarLocate open access versionFindings
  • G. Karypis and V. Kumar, Fast Sparse Cholesky Factorization on Scalable Parallel Computers, Tech. Rep., Department of Computer Science, University of Minnesota, Minneapolis, 1994; a short version appears in the Eighth Symposium on the Frontiers of Massively Parallel Computation, IEEE Computer Society, San Diego, CA, 1995. Also available online from http://www.cs.umn.edu/̃karypis.
    Findings
  • G. Karypis and V. Kumar, MeTiS3.0: Unstructured Graph Partitioning and Sparse Matrix Ordering System, Tech. Rep. 97-061, Department of Computer Science, University of Minnesota, Minneapolis, 1997; also available online from http://www.cs.umn.edu/̃metis.
    Findings
  • G. Karypis and V. Kumar, Multilevel k-way partitioning scheme for irregular graphs, J. Parallel Distrib. Comput., 48 (1998), pp. 96–129; also available online from http://www.cs.umn.edu/̃karypis.
    Locate open access versionFindings
  • G. Karypis and V. Kumar, A parallel algorithm for multilevel graph partitioning and sparse matrix ordering, J. Parallel Distrib. Comput., 48 (1998), pp. 71–95; also available online from http://www.cs.umn.edu/̃karypis. A short version appears in Proc. Internat. Parallel Processing Symposium, CRC Press, Boca Raton, FL, 1996.
    Locate open access versionFindings
  • G. Karypis and V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput., 20 (1998), pp. 359–392; also available online from http://www.cs.umn.edu/̃karypis. A short version appears in Proc. Internat. Conf.on Parallel Processing, CRC Press, Boca Raton, FL, 1995.
    Locate open access versionFindings
  • B. W. Kernighan and S. Lin, An efficient heuristic procedure for partitioning graphs, Bell System Tech. J., 49 (1970), pp. 291–307.
    Google ScholarLocate open access versionFindings
  • V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms, Benjamin/Cummings, Redwood City, CA, 1994.
    Google ScholarFindings
  • M. Luby, A simple parallel algorithm for the maximal independent set problem, SIAM J. Comput., 15 (1986), pp. 1036–1053.
    Google ScholarLocate open access versionFindings
  • A. Pothen, H. D. Simon, and K.-P. Liou, Partitioning sparse matrices with eigenvectors of graphs, SIAM J. Matrix Anal. Appl., 11 (1990), pp. 430–452.
    Google ScholarLocate open access versionFindings
  • P. Raghavan, Parallel Ordering Using Edge Contraction, Tech. Rep. CS-95-293, Department of Computer Science, University of Tennessee, Knoxville, 1995.
    Google ScholarFindings
  • E. Rothberg, Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon multicomputers, in Proc. 1994 Scalable High Performance Computing Conference, IEEE Computer Society, San Diego, CA, 1994.
    Google ScholarLocate open access versionFindings
  • K. Schloegel, G. Karypis, and V. Kumar, Multilevel diffusion algorithms for repartitioning of adaptive meshes, J. Parallel Distrib. Comput., 47 (1997), pp. 109–124; also available online from http://www.cs.umn.edu/̃karypis.
    Locate open access versionFindings
  • R. V. Shankar and S. Ranka, Random data accesses on coarse-grained parallel machine, J. Parallel Distrib. Comput., 44 (1997), pp. 24–34.
    Google ScholarLocate open access versionFindings
  • C. Walshaw, M. Cross, and M. G. Everett, Dynamic load-balancing for parallel adaptive unstructured meshes, in Proc. Eighth SIAM Conference on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1997.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
小科