## AI帮你理解科学

## AI 精读

AI抽取本论文的概要总结

微博一下：

# Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs.

SIAM Review, no. 2 (1999): 278-300

EI

关键词

摘要

In this paper we present a parallel formulation of a multilevel k-way graph partitioning algorithm. A key feature of this parallel formulation is that it is able to achieve a high degree of concurrency while maintaining the high quality of the partitions produced by the serial multilevel k-way partitioning algorithm. In particular, the ti...更多

代码：

数据：

简介

- Graph partitioning is an important problem that has extensive applications in many areas, including scientific computing, very large scale integration (VLSI) design, geographical information systems, operations research, and task scheduling.
- A number of researchers have investigated a class of algorithms in which the original graph is successively coarsened down until it has only a small number of vertices, a partitioning of this coarsened graph is computed, and this initial partitioning is successively refined using a Kernighan–Lin (KL) type heuristic as it is being projected back to the original graph
- This multilevel paradigm was studied independently by Bui and Jones [4] in the context of computing fill-reducing matrix reordering, by Hendrickson and Leland [13] in the context of finite element grid partitioning, and by Hauck and Borriello [11] and Cong and Smith [5] for hypergraph partitioning.
- Multilevel k-way partitioning techniques are generally faster and provide better quality solutions than multilevel recursive bisection schemes [18]

重点内容

- Graph partitioning is an important problem that has extensive applications in many areas, including scientific computing, very large scale integration (VLSI) design, geographical information systems, operations research, and task scheduling
- We evaluated the performance of our parallel multilevel k-way graph partitioning algorithm on a wide range of graphs arising in different application domains
- In this paper we presented a parallel formulation of the multilevel k-way partitioning algorithm
- We show that the algorithm is scalable for a class of graphs that includes the commonly used finite element meshes
- The time taken by our parallel graph partitioning algorithm is only slightly longer than the time taken for rearrangement of the graph among processors according to the new partition
- We know from Table 6.3 that, for AUTO, going from a 16-way to a 128-way partition, the run time increases from 48.49 seconds to 54.61 seconds, a 12.6% increase
- Experiments with a variety of finite element graphs show that our parallel formulation produces high-quality partitioning in a short amount of time

结果

- The authors evaluated the performance of the parallel multilevel k-way graph partitioning algorithm on a wide range of graphs arising in different application domains.
- The authors implemented the parallel multilevel algorithm on a 128-processor Cray T3D parallel computer.
- Each processor on the T3D is a 150 MHz Dec Alpha (EV4).

结论

- In this paper the authors presented a parallel formulation of the multilevel k-way partitioning algorithm.
- Even though the parallel formulation is implemented for non-cache-coherent shared-address space architectures such as the Cray T3D, the formulation can be adapted for message passing architectures such as the IBM SP2.
- On such architectures, all interactions are done via message passing that has much larger startup latency than that of the one-sided communication operations on architectures such as the Cray T3D.
- Due to this worse ratio of computation and communication, comparable efficiency will be obtained only for proportionately larger graphs

总结

## Introduction:

Graph partitioning is an important problem that has extensive applications in many areas, including scientific computing, very large scale integration (VLSI) design, geographical information systems, operations research, and task scheduling.- A number of researchers have investigated a class of algorithms in which the original graph is successively coarsened down until it has only a small number of vertices, a partitioning of this coarsened graph is computed, and this initial partitioning is successively refined using a Kernighan–Lin (KL) type heuristic as it is being projected back to the original graph
- This multilevel paradigm was studied independently by Bui and Jones [4] in the context of computing fill-reducing matrix reordering, by Hendrickson and Leland [13] in the context of finite element grid partitioning, and by Hauck and Borriello [11] and Cong and Smith [5] for hypergraph partitioning.
- Multilevel k-way partitioning techniques are generally faster and provide better quality solutions than multilevel recursive bisection schemes [18]
## Results:

The authors evaluated the performance of the parallel multilevel k-way graph partitioning algorithm on a wide range of graphs arising in different application domains.- The authors implemented the parallel multilevel algorithm on a 128-processor Cray T3D parallel computer.
- Each processor on the T3D is a 150 MHz Dec Alpha (EV4).
## Conclusion:

In this paper the authors presented a parallel formulation of the multilevel k-way partitioning algorithm.- Even though the parallel formulation is implemented for non-cache-coherent shared-address space architectures such as the Cray T3D, the formulation can be adapted for message passing architectures such as the IBM SP2.
- On such architectures, all interactions are done via message passing that has much larger startup latency than that of the one-sided communication operations on architectures such as the Cray T3D.
- Due to this worse ratio of computation and communication, comparable efficiency will be obtained only for proportionately larger graphs

- Table1: Various graphs used in evaluating the parallel multilevel k-way graph partitioning algorithm
- Table2: The performance of the parallel multilevel k-way partitioning algorithm on the Cray T3D. For each graph, the performance is shown for 16-, 32-, 64-, and 128-way partitions on 16, 32, 64, and 128 processors, respectively. The times are in seconds
- Table3: The performance of the serial multilevel k-way partitioning algorithm. For each graph, the performance is shown for 16-, 32-, 64-, and 128-way partitions. The times are in seconds on an SGI Challenge workstation
- Table4: The amount of time (in seconds) required by the different phases of the parallel partitioning algorithm for some graphs, on 16 and 128 processors
- Table5: The amount of time (in seconds) required by the different phases of the parallel partitioning algorithm for different initial vertex distributions, on 16 and 128 processors. The columns labeled “Rand” correspond to a random distribution of the graph, whereas the columns labeled “PrePart” correspond to a prepartitioned distribution of the graph

相关工作

- Out of the three phases of the multilevel k-way partitioning algorithm described in section 2, the coarsening and the uncoarsening phases require the bulk of the computation (over 95%). Hence, it is critical for any efficient parallel formulation of the multilevel k-way partitioning algorithm to successfully parallelize these two phases. In the following, we review the difficulties encountered in parallelizing these phases, and previous related works.

Coarsening. Recall that, during the coarsening phase (section 2.1), a matching of the edges is computed, and this is used to contract the graph. One possible way of computing the matching in parallel is to have each processor only compute matchings between the vertices that it stores locally, and to use these local matchings to contract the graph. Since each pair of matched vertices resides on the same processor, this approach requires no communication during the contraction step. This approach works well as long as each processor stores relatively well connected portions of the entire graph. In particular, if the graph is distributed among the processors in a partitioned fashion, then this approach works extremely well. This is not a realistic assumption in many cases, since a good partitioning of the graph is what we are trying to compute by the multilevel k-way partitioner. Nevertheless, this approach of local matchings can work reasonably well when the number of processors used is small relative to the size of the graph and the average degree of the graph is relatively high.

基金

- This work was supported by NSF CCR-9423082, by Army Research Office contract DA/DAAH04-95-1-0538, and by the Army High Performance Computing Research Center (AHPCRC) under the auspices of the Department of the Army, Army Research Laboratory cooperative agreement DAAH04-95-2-0003/contract DAAH04-95-C-0008

引用论文

- S. T. Barnard, PMRSB: Parallel multilevel recursive spectral bisection, in Supercomputing 1995, ACM and IEEE Computer Society, San Diego, CA, 1995.
- S. T. Barnard and H. Simon, A parallel implementation of multilevel recursive spectral bisection for application to adaptive unstructured meshes, in Proc. Seventh SIAM Conf. on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1995, pp. 627–632.
- S. T. Barnard and H. D. Simon, A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems, in Proc. Sixth SIAM Conf. on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1993, pp. 711–718.
- T. N. Bui and C. Jones, A heuristic for reducing fill-in in sparse matrix factorization, in Proc. Sixth SIAM Conf. on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1993, pp. 445–452.
- J. Cong and M. L. Smith, A parallel bottom-up clustering algorithm with applications to circuit partitioning in VLSI design, in Proc. ACM/IEEE Design Automation Conference, Dallas, TX, 1993, pp. 755–760.
- K. D. Devine and J. E. Flaherty, Dynamic load balancing for parallel finite element methods with adaptive h- and p-refinement, in Proc. Seventh SIAM Conf. on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1995, pp. 593–598.
- P. Diniz, S. Plimpton, B. Hendrickson, and R. Leland, Parallel algorithms for dynamically partitioning unstructured grids, in Proc. Seventh SIAM Conf. on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1995, pp. 615–620.
- A. George, Nested dissection of a regular finite element mesh, SIAM J. Numer. Anal., 10 (1973), pp. 345–363.
- J. R. Gilbert and E. Zmijewski, A parallel graph partitioning algorithm for a message-passing multiprocessor, Internat. J. Parallel Programming, (1987), pp. 498–513.
- A. Gupta, G. Karypis, and V. Kumar, Highly scalable parallel algorithms for sparse matrix factorization, IEEE Trans. Parallel and Distributed Systems, 8 (1997), pp. 502–520; also available online from http://www.cs.umn.edu/̃karypis.
- S. Hauck and G. Borriello, An evaluation of bipartitioning technique, in Proc. Chapel Hill Conf. on Advanced Research in VLSI, IEEE Computer Society, San Diego, CA, 1995.
- M. T. Heath and P. Raghavan, A Cartesian parallel nested dissection algorithm, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 235–253.
- B. Hendrickson and R. Leland, A Multilevel Algorithm for Partitioning Graphs, Tech. Rep. SAND93-1301, Sandia National Laboratories, Albuquerque, NM, 1993.
- Z. Johan, K. K. Mathur, S. L. Johnsson, and T. J. R. Hughes, Finite Element Methods on the Connection Machine CM-5 System, Tech. Rep., Thinking Machines Corporation, Burlington, MA, 1993.
- M. T. Jones and P. E. Plassmann, A parallel graph coloring heuristic, SIAM J. Sci. Comput., 14 (1993), pp. 654–669.
- G. Karypis and V. Kumar, Fast Sparse Cholesky Factorization on Scalable Parallel Computers, Tech. Rep., Department of Computer Science, University of Minnesota, Minneapolis, 1994; a short version appears in the Eighth Symposium on the Frontiers of Massively Parallel Computation, IEEE Computer Society, San Diego, CA, 1995. Also available online from http://www.cs.umn.edu/̃karypis.
- G. Karypis and V. Kumar, MeTiS3.0: Unstructured Graph Partitioning and Sparse Matrix Ordering System, Tech. Rep. 97-061, Department of Computer Science, University of Minnesota, Minneapolis, 1997; also available online from http://www.cs.umn.edu/̃metis.
- G. Karypis and V. Kumar, Multilevel k-way partitioning scheme for irregular graphs, J. Parallel Distrib. Comput., 48 (1998), pp. 96–129; also available online from http://www.cs.umn.edu/̃karypis.
- G. Karypis and V. Kumar, A parallel algorithm for multilevel graph partitioning and sparse matrix ordering, J. Parallel Distrib. Comput., 48 (1998), pp. 71–95; also available online from http://www.cs.umn.edu/̃karypis. A short version appears in Proc. Internat. Parallel Processing Symposium, CRC Press, Boca Raton, FL, 1996.
- G. Karypis and V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput., 20 (1998), pp. 359–392; also available online from http://www.cs.umn.edu/̃karypis. A short version appears in Proc. Internat. Conf.on Parallel Processing, CRC Press, Boca Raton, FL, 1995.
- B. W. Kernighan and S. Lin, An efficient heuristic procedure for partitioning graphs, Bell System Tech. J., 49 (1970), pp. 291–307.
- V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms, Benjamin/Cummings, Redwood City, CA, 1994.
- M. Luby, A simple parallel algorithm for the maximal independent set problem, SIAM J. Comput., 15 (1986), pp. 1036–1053.
- A. Pothen, H. D. Simon, and K.-P. Liou, Partitioning sparse matrices with eigenvectors of graphs, SIAM J. Matrix Anal. Appl., 11 (1990), pp. 430–452.
- P. Raghavan, Parallel Ordering Using Edge Contraction, Tech. Rep. CS-95-293, Department of Computer Science, University of Tennessee, Knoxville, 1995.
- E. Rothberg, Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon multicomputers, in Proc. 1994 Scalable High Performance Computing Conference, IEEE Computer Society, San Diego, CA, 1994.
- K. Schloegel, G. Karypis, and V. Kumar, Multilevel diffusion algorithms for repartitioning of adaptive meshes, J. Parallel Distrib. Comput., 47 (1997), pp. 109–124; also available online from http://www.cs.umn.edu/̃karypis.
- R. V. Shankar and S. Ranka, Random data accesses on coarse-grained parallel machine, J. Parallel Distrib. Comput., 44 (1997), pp. 24–34.
- C. Walshaw, M. Cross, and M. G. Everett, Dynamic load-balancing for parallel adaptive unstructured meshes, in Proc. Eighth SIAM Conference on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1997.

标签

评论