AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We proposed a novel algorithm called SCAN, which detects clusters, hubs and outliers in networks

SCAN: a structural clustering algorithm for networks

KDD, pp.824-833, (2007)

Cited: 773|Views258
EI

Abstract

Network clustering (or graph partitioning) is an important task for the discovery of underlying structures in networks. Many algorithms find clusters by maximizing the number of intra-cluster edges. While such algorithms find useful and interesting structures, they tend to fail to identify and isolate two kinds of vertices that play speci...More

Code:

Data:

0
Introduction
  • Much data of current interest to the scientific community can be modeled as networks.
  • The world-wide web can be modeled as a graph, where web pages are represented as vertices that are connected by an edge when one pages contains a hyperlink to another [2] [3].
  • Modularity-based algorithms [6][11][12] and normalized cut [4][5] are successful examples
  • They do not distinguish the roles of the vertices in the networks.
Highlights
  • Much data of current interest to the scientific community can be modeled as networks
  • The performance of SCAN is compared with FastModularity, a fast modularity-based network clustering algorithm proposed by Clauset et al in [12], which is faster than many competing algorithms: its running time on a graph with n vertices and m edges is O where d is the depth of the dendrogram describing the hierarchical cluster structure
  • Network clustering is a fundamental task in many fields of science and engineering
  • Many algorithms have been proposed from practitioners in different disciplines including computer science and physics
  • Identifying hubs is essential for applications such as viral marketing and epidemiology
  • As vertices that bridge clusters, hubs are responsible for spreading ideas or disease
Results
  • The authors evaluate the algorithm SCAN using both synthetic and real datasets.
  • The performance of SCAN is compared with FastModularity, a fast modularity-based network clustering algorithm proposed by Clauset et al in [12], which is faster than many competing algorithms: its running time on a graph with n vertices and m edges is O where d is the depth of the dendrogram describing the hierarchical cluster structure.
  • To evaluate the computational efficiency of the proposed algorithm the authors generate ten graphs with the number of vertices ranging from 1,000 to 1,000,000 and the number of edges ranging from 2,182 to 2,000,190.
  • An example of a generated graph is presented in Figure 3
Conclusion
  • Network clustering is a fundamental task in many fields of science and engineering. Many algorithms have been proposed from practitioners in different disciplines including computer science and physics.
  • Successful examples are Min-Max Cut [4] and Normalized Cut [5], as well as Modularity-based algorithms [6][11][12]
  • While such algorithms can successfully detect clusters in networks, they tend to fail to identify and isolate two kinds of vertices that play special roles – vertices that bridge clusters and vertices that are marginally connected to clusters.
  • Outliers have little or no influence, and may be isolated as noise in the data
Tables
  • Table1: Adjust Rand Index Comparison
Download tables as Excel
Related work
  • Network clustering (or graph partitioning) is the division of a graph into a set of sub-graphs, called clusters. More specifically, given a graph G = {V, E}, where V is a set of vertices and E is a set of edges between vertices, the goal of graph partitioning is to divide G into k disjoint sub-graphs Gi = {Vi, Ei}, in which Vi ∩ Vj k

    ∑ = Φ for any i≠j, and V = Vi . The number of sub-graphs, k, i=1 may or may not be known a priori. In this paper, we focus on simple, undirected, and un-weighted graphs.

    The problem of finding good clustering of networks has been studied for some decades in many fields, particularly computer science and physics. Here we review some of the more common methods.
Reference
  • S. Wasserman and K. Faust, “Social Network Analysis.” Cambridge University Press, Cambridge (1994).
    Google ScholarFindings
  • R. Albert, H. Jeong, and A.-L. Barabási, “Diameter of the world-wide web.” Nature 401, 130–131 (1999).
    Google ScholarLocate open access versionFindings
  • J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “The Web as a graph: Measurements, models and methods.” In Proceedings of the International Conference on Combinatorics and Computing, number 1627 in Lecture Notes in Computer Science, pp. 1–18, Springer, Berlin (1999).
    Google ScholarLocate open access versionFindings
  • C. Ding, X. He, H. Zha, M. Gu, and H. Simon, “A min-max cut algorithm for graph partitioning and data clustering”, Proc. of ICDM 2001.
    Google ScholarLocate open access versionFindings
  • J. Shi and J. Malik, “Normalized cuts and image segmentation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 22, No. 8, 2000.
    Google ScholarLocate open access versionFindings
  • R. Guimera and L. A. N. Amaral, “Functional cartography of complex metabolic networks.” Nature 433, 895–900 (2005).
    Google ScholarLocate open access versionFindings
  • J. Kleinberg. “Authoritative sources in a hyperlinked environment.” Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.
    Google ScholarLocate open access versionFindings
  • P. Domingos and M. Richardson, “Mining the Network Value of Customers”, Proc. 7th ACM SIGKDD, pp. 57 – 66, 2001.
    Google ScholarLocate open access versionFindings
  • Y. Wang, D. Chakrabarti, C. Wang and C. Faloutsos, “Epidemic Spreading in Real Networks: An Eigenvalue Viewpoint”, SRDS 2003 (pages 25-34), Florence, Italy
    Google ScholarLocate open access versionFindings
  • M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. "A DensityBased Algorithm for Discovering Clusters in Large Spatial Databases with Noise". In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), Portland, OR, pages 291-316. AAAI Press, 1996.
    Google ScholarLocate open access versionFindings
  • M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks”, Phys. Rev. E 69, 026113 (2004).
    Google ScholarLocate open access versionFindings
  • A. Clauset, M. E. J. Newman, and C. Moore, “Finding community in very large networks”, Physical Review E 70, 066111 (2004).
    Google ScholarLocate open access versionFindings
  • D. J. Watts and S. H. Strogatz, “Collective dynamics of 'small-world' networks,” Nature, 393:440-442 (1998)
    Google ScholarLocate open access versionFindings
  • W. M. Rand, “Objective criteria for the evaluation of clustering methods.” Journal of the American Statistical Association, 66, pp846–850 (1971).
    Google ScholarLocate open access versionFindings
  • L. Hubert and P. Arabie, “Comparing Partitions”. Journal of Classification, 193–218, 1985.
    Google ScholarLocate open access versionFindings
  • G. W. Milligan and M. C. Cooper, “A study of the comparability of external criteria for hierarchical cluster analysis”, Multivariate Behavioral Research, 21, 441–458, 1986.
    Google ScholarLocate open access versionFindings
  • http://cs.unm.edu/~aaron/research/fastmodularity.htm.
    Findings
  • http://www.orgnet.com/.
    Findings
  • http://www-personal.umich.edu/~mejn/netdata/.
    Findings
  • P. Erdös and A. Rényi, Publ. Math. (Debrecen) 6, 290 (1959).
    Google ScholarLocate open access versionFindings
  • M. Faloutsos, P. Faloutsos and C. Faloutsos, On Power-Law Relationships of the Internet Topology, SIGCOMM 1999.
    Google ScholarFindings
  • A.-L. Barabási and Z. N. Oltvai, Nature Reviews Genetics 5, 101-113 (2004).
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn