## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# SCAN: a structural clustering algorithm for networks

KDD, pp.824-833, (2007)

EI

Keywords

Abstract

Network clustering (or graph partitioning) is an important task for the discovery of underlying structures in networks. Many algorithms find clusters by maximizing the number of intra-cluster edges. While such algorithms find useful and interesting structures, they tend to fail to identify and isolate two kinds of vertices that play speci...More

Code:

Data:

Introduction

- Much data of current interest to the scientific community can be modeled as networks.
- The world-wide web can be modeled as a graph, where web pages are represented as vertices that are connected by an edge when one pages contains a hyperlink to another [2] [3].
- Modularity-based algorithms [6][11][12] and normalized cut [4][5] are successful examples
- They do not distinguish the roles of the vertices in the networks.

Highlights

- Much data of current interest to the scientific community can be modeled as networks
- The performance of SCAN is compared with FastModularity, a fast modularity-based network clustering algorithm proposed by Clauset et al in [12], which is faster than many competing algorithms: its running time on a graph with n vertices and m edges is O where d is the depth of the dendrogram describing the hierarchical cluster structure
- Network clustering is a fundamental task in many fields of science and engineering
- Many algorithms have been proposed from practitioners in different disciplines including computer science and physics
- Identifying hubs is essential for applications such as viral marketing and epidemiology
- As vertices that bridge clusters, hubs are responsible for spreading ideas or disease

Results

- The authors evaluate the algorithm SCAN using both synthetic and real datasets.
- The performance of SCAN is compared with FastModularity, a fast modularity-based network clustering algorithm proposed by Clauset et al in [12], which is faster than many competing algorithms: its running time on a graph with n vertices and m edges is O where d is the depth of the dendrogram describing the hierarchical cluster structure.
- To evaluate the computational efficiency of the proposed algorithm the authors generate ten graphs with the number of vertices ranging from 1,000 to 1,000,000 and the number of edges ranging from 2,182 to 2,000,190.
- An example of a generated graph is presented in Figure 3

Conclusion

- Network clustering is a fundamental task in many fields of science and engineering. Many algorithms have been proposed from practitioners in different disciplines including computer science and physics.
- Successful examples are Min-Max Cut [4] and Normalized Cut [5], as well as Modularity-based algorithms [6][11][12]
- While such algorithms can successfully detect clusters in networks, they tend to fail to identify and isolate two kinds of vertices that play special roles – vertices that bridge clusters and vertices that are marginally connected to clusters.
- Outliers have little or no influence, and may be isolated as noise in the data

- Table1: Adjust Rand Index Comparison

Related work

- Network clustering (or graph partitioning) is the division of a graph into a set of sub-graphs, called clusters. More specifically, given a graph G = {V, E}, where V is a set of vertices and E is a set of edges between vertices, the goal of graph partitioning is to divide G into k disjoint sub-graphs Gi = {Vi, Ei}, in which Vi ∩ Vj k

∑ = Φ for any i≠j, and V = Vi . The number of sub-graphs, k, i=1 may or may not be known a priori. In this paper, we focus on simple, undirected, and un-weighted graphs.

The problem of finding good clustering of networks has been studied for some decades in many fields, particularly computer science and physics. Here we review some of the more common methods.

Reference

- S. Wasserman and K. Faust, “Social Network Analysis.” Cambridge University Press, Cambridge (1994).
- R. Albert, H. Jeong, and A.-L. Barabási, “Diameter of the world-wide web.” Nature 401, 130–131 (1999).
- J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “The Web as a graph: Measurements, models and methods.” In Proceedings of the International Conference on Combinatorics and Computing, number 1627 in Lecture Notes in Computer Science, pp. 1–18, Springer, Berlin (1999).
- C. Ding, X. He, H. Zha, M. Gu, and H. Simon, “A min-max cut algorithm for graph partitioning and data clustering”, Proc. of ICDM 2001.
- J. Shi and J. Malik, “Normalized cuts and image segmentation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 22, No. 8, 2000.
- R. Guimera and L. A. N. Amaral, “Functional cartography of complex metabolic networks.” Nature 433, 895–900 (2005).
- J. Kleinberg. “Authoritative sources in a hyperlinked environment.” Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.
- P. Domingos and M. Richardson, “Mining the Network Value of Customers”, Proc. 7th ACM SIGKDD, pp. 57 – 66, 2001.
- Y. Wang, D. Chakrabarti, C. Wang and C. Faloutsos, “Epidemic Spreading in Real Networks: An Eigenvalue Viewpoint”, SRDS 2003 (pages 25-34), Florence, Italy
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. "A DensityBased Algorithm for Discovering Clusters in Large Spatial Databases with Noise". In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), Portland, OR, pages 291-316. AAAI Press, 1996.
- M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks”, Phys. Rev. E 69, 026113 (2004).
- A. Clauset, M. E. J. Newman, and C. Moore, “Finding community in very large networks”, Physical Review E 70, 066111 (2004).
- D. J. Watts and S. H. Strogatz, “Collective dynamics of 'small-world' networks,” Nature, 393:440-442 (1998)
- W. M. Rand, “Objective criteria for the evaluation of clustering methods.” Journal of the American Statistical Association, 66, pp846–850 (1971).
- L. Hubert and P. Arabie, “Comparing Partitions”. Journal of Classification, 193–218, 1985.
- G. W. Milligan and M. C. Cooper, “A study of the comparability of external criteria for hierarchical cluster analysis”, Multivariate Behavioral Research, 21, 441–458, 1986.
- http://cs.unm.edu/~aaron/research/fastmodularity.htm.
- http://www.orgnet.com/.
- http://www-personal.umich.edu/~mejn/netdata/.
- P. Erdös and A. Rényi, Publ. Math. (Debrecen) 6, 290 (1959).
- M. Faloutsos, P. Faloutsos and C. Faloutsos, On Power-Law Relationships of the Internet Topology, SIGCOMM 1999.
- A.-L. Barabási and Z. N. Oltvai, Nature Reviews Genetics 5, 101-113 (2004).

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn