AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We first perform experiments on graphs generated from the Directed Stochastic Block Model which is introduced in

Higher-Order Spectral Clustering of Directed Graphs

NIPS 2020, (2020)

被引用0|浏览42
EI
下载 PDF 全文
引用
微博一下

摘要

Clustering is an important topic in algorithms, and has a number of applications in machine learning, computer vision, statistics, and several other research disciplines. Traditional objectives of graph clustering are to find clusters with low conductance. Not only are these objectives just applicable for undirected graphs, they are als...更多

代码

数据

0
简介
  • Clustering is one of the most fundamental problems in algorithms and has applications in many research fields including machine learning, network analysis, and statistics.
  • In this work the authors study clustering algorithms for digraphs whose cluster structure is defined with respect to the imbalance of edge densities as well as the edge directions between the clusters.
重点内容
  • Clustering is one of the most fundamental problems in algorithms and has applications in many research fields including machine learning, network analysis, and statistics
  • Let us look at the international oil trade network [26], which employs digraphs to represent how mineral fuels and oils are imported and exported between countries. This highly connected digraph presents little cluster structure with respect to a typical objective function of undirected graph clustering, from an economic point of view this digraph clearly exhibits a structure of clusters: there is a cluster of countries mainly exporting oil, a cluster mainly importing oil, and several clusters in the middle of this trade chain
  • We first perform experiments on graphs generated from the Directed Stochastic Block Model (DSBM) which is introduced in [8]
  • We introduce a path structure into the DSBM, and compare the performance of our algorithm against the others
  • For given parameters k, n, p, q, η, a graph randomly chosen from the DSBM is constructed as follows: the overall graph consists of k clusters S0, . . . , Sk−1 of the same size, each of which can be initially viewed as a G(n, p) random graph
结果
  • The authors study the structure of clusters with respect to their flow imbalance, and their relation to the bottom eigenspace of the normalised Hermitian Laplacian matrix.
  • Sk−1) shares some similarity with the normalised cut value for undirected graph clustering [23], in the setting an optimal clustering is the one that maximises the flow ratio.
  • Sk−1 and the eigen-structure of the normalised Laplacian matrix of the graph, the authors define for every optimal cluster Sj (0 j k − 1) an indicator vector χj ∈ Cn by χj(u) w 2π·k j if u ∈ Sj and χj(u) = 0 otherwise.
  • Combining these three lemmas with some combinatorial analysis, the authors prove that the symmetric difference between every returned cluster by the algorithm and its corresponding cluster in the optimal partition can be upper bounded, since otherwise the cost value of the returned clusters would contradict Lemma 4.1.
  • Given the adjacency matrix M ∈ Rn×n as input, the DD-SYM algorithm computes the matrix A = M M + M M , and uses the top k eigenvectors of a random walk matrix D−1A to construct an embedding for k-means clustering.
  • The Herm-RW algorithm uses the imaginary unit i to represent directed edges and applies the top k/2 eigenvectors of a random walk matrix to construct an embedding for k-means.
  • It is easy to see that, when the underlying graph presents a clear flow structure, the algorithm performs significantly better than both the Herm-RW and DD-SYM algorithms, for which multiple eigenvectors are needed.
  • The authors compute the clustering results of the SimpleHerm algorithm on the same dataset from 2002 to 2017, and compare it with the output of the DD-SYM algorithm.
结论
  • The primary focus of the work is efficient clustering algorithms for digraphs, whose clusters are defined with respect to the edge directions between different clusters.
  • As shown by the experimental results on the UN Comtrade Dataset, the work could be employed to analyse many practical data for which most traditional clustering algorithms do not suffice.
总结
  • Clustering is one of the most fundamental problems in algorithms and has applications in many research fields including machine learning, network analysis, and statistics.
  • In this work the authors study clustering algorithms for digraphs whose cluster structure is defined with respect to the imbalance of edge densities as well as the edge directions between the clusters.
  • The authors study the structure of clusters with respect to their flow imbalance, and their relation to the bottom eigenspace of the normalised Hermitian Laplacian matrix.
  • Sk−1) shares some similarity with the normalised cut value for undirected graph clustering [23], in the setting an optimal clustering is the one that maximises the flow ratio.
  • Sk−1 and the eigen-structure of the normalised Laplacian matrix of the graph, the authors define for every optimal cluster Sj (0 j k − 1) an indicator vector χj ∈ Cn by χj(u) w 2π·k j if u ∈ Sj and χj(u) = 0 otherwise.
  • Combining these three lemmas with some combinatorial analysis, the authors prove that the symmetric difference between every returned cluster by the algorithm and its corresponding cluster in the optimal partition can be upper bounded, since otherwise the cost value of the returned clusters would contradict Lemma 4.1.
  • Given the adjacency matrix M ∈ Rn×n as input, the DD-SYM algorithm computes the matrix A = M M + M M , and uses the top k eigenvectors of a random walk matrix D−1A to construct an embedding for k-means clustering.
  • The Herm-RW algorithm uses the imaginary unit i to represent directed edges and applies the top k/2 eigenvectors of a random walk matrix to construct an embedding for k-means.
  • It is easy to see that, when the underlying graph presents a clear flow structure, the algorithm performs significantly better than both the Herm-RW and DD-SYM algorithms, for which multiple eigenvectors are needed.
  • The authors compute the clustering results of the SimpleHerm algorithm on the same dataset from 2002 to 2017, and compare it with the output of the DD-SYM algorithm.
  • The primary focus of the work is efficient clustering algorithms for digraphs, whose clusters are defined with respect to the edge directions between different clusters.
  • As shown by the experimental results on the UN Comtrade Dataset, the work could be employed to analyse many practical data for which most traditional clustering algorithms do not suffice.
相关工作
  • There is a rich literature on spectral algorithms for graph clustering. For undirected graph clustering, the works most related to ours are [21, 23, 27]. For digraph clustering, [22] proposes to perform spectral clustering on the symmetrised matrix A = M M + M M of the input graph’s adjacency matrix M ; [8] initiates the studies of spectral clustering on complex-valued Hermitian matrix representations of digraphs, however their theoretical analysis only holds for digraphs generated from the stochastic block model. Our work is also linked to analysing higher-order structures of clusters in undirected graphs [4, 5, 28], and community detection in digraphs [7, 19]. The main takeaway is that there is no previous work which analyses digraph spectral clustering algorithms to uncover the higher-order structure of clusters in a general digraph.

    Throughout the paper, we always assume that G = (V, E, w) is a digraph with n vertices, m edges, and weight function w : V × V → R 0. We write u v if there is an edge from u to v in the graph. For any vertex u, the in-degree and out-degree of u are defined as diun v:v u w(v, u) and douut v:u v w(u, v), respectively. We further define the total degree of u by du diun + douut, and define vol(S) u∈S du for any S ⊆ V . For any set of vertices S and T , the symmetric difference between S and T is defined by S T (S \ T ) ∪ (T \ S).
基金
  • He Sun is supported by an EPSRC Early Career Fellowship (EP/T00729X/1)
引用论文
  • H. An, W. Zhong, Y. Chen, H. Li, and X. Gao. Features and evolution of international crude oil trade relationships: A trading-based network analysis. Energy, 74:254 – 259, 2014.
    Google ScholarLocate open access versionFindings
  • F. I. Association. Total 2017 volume 25.2 billion contracts, down 0.1% from 2016. https://www.fia.org/resources/total-2017-volume-252-billion-contracts-down-01-2016, Jan 2018. Accessed:2020-06-05.
    Locate open access versionFindings
  • N. B. Behmiri and J. R. P. Manso. Crude oil conservation policy hypothesis in OECD (organisation for economic cooperation and development) countries: A multivariate panel Granger causality test. Energy, 43(1):253–260, 2012.
    Google ScholarLocate open access versionFindings
  • A. R. Benson, D. F. Gleich, and J. Leskovec. Tensor spectral clustering for partitioning higher-order network structures. In International Conference on Data Mining, pages 118–126, 2015.
    Google ScholarLocate open access versionFindings
  • A. R. Benson, D. F. Gleich, and J. Leskovec. Higher-order organization of complex networks. Science, 353(6295):163–166, 2016.
    Google ScholarLocate open access versionFindings
  • F. Chung. Spectral graph theory. In CBMS: Conference Board of the Mathematical Sciences, Regional Conference Series, 1997.
    Google ScholarLocate open access versionFindings
  • F. Chung. Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9(1):1–19, 2005.
    Google ScholarLocate open access versionFindings
  • M. Cucuringu, H. Li, H. Sun, and L. Zanetti. Hermitian matrices for clustering directed graphs: insights and applications. In International Conference on Artificial Intelligence and Statistics, 2020.
    Google ScholarLocate open access versionFindings
  • L.-B. Cui, P. Peng, and L. Zhu. Embodied energy, export policy adjustment and China’s sustainable development: a multi-regional input-output analysis. Energy, 82:457–467, 2015.
    Google ScholarLocate open access versionFindings
  • N. Cui, Y. Lei, and W. Fang. Design and impact estimation of a reform program of China’s tax and fee policies for low-grade oil and gas resources. Petroleum Science, 8(4):515–526, 2011.
    Google ScholarLocate open access versionFindings
  • R. Du, G. Dong, L. Tian, M. Wang, G. Fang, and S. Shao. Spatiotemporal dynamics and fitness analysis of global oil market: Based on complex network. Public Library of Science one, 11(10), 2016.
    Google ScholarLocate open access versionFindings
  • A. J. Gates and Y.-Y. Ahn. The impact of random models on clustering similarity. The Journal of Machine Learning Research, 18(1):3049–3076, 2017.
    Google ScholarLocate open access versionFindings
  • M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, 42(6):1115–1145, 1995.
    Google ScholarLocate open access versionFindings
  • J. D. Hamilton. Historical oil shocks. Technical report, National Bureau of Economic Research, 2011.
    Google ScholarFindings
  • T. Kastner, K.-H. Erb, and S. Nonhebel. International wood trade and forest change: A global analysis. Global Environmental Change, 21(3):947–956, 2011.
    Google ScholarLocate open access versionFindings
  • L. Kilian. Exogenous oil supply shocks: how big are they and how much do they matter for the US economy? The Review of Economics and Statistics, 90(2):216–240, 2008.
    Google ScholarLocate open access versionFindings
  • Korea Centers for Disease Control & Prevention. Data science for COVID-19. https://www.kaggle.com/kimjihoo/coronavirusdataset, 2020. Accessed:2020-06-03.
    Findings
  • J. R. Lee, S. O. Gharan, and L. Trevisan. Multiway spectral partitioning and higher-order Cheeger inequalities. Journal of the ACM, 61(6):37:1–37:30, 2014.
    Google ScholarLocate open access versionFindings
  • E. A. Leicht and M. E. J. Newman. Community structure in directed networks. Physical Review Letters, 100:118703, 2008.
    Google ScholarLocate open access versionFindings
  • H. Li, H. Sun, and L. Zanetti. Hermitian Laplacians and a Cheeger inequality for the Max-2-Lin problem. In 27th Annual European Symposium on Algorithms (ESA), pages 1–14, 2019.
    Google ScholarLocate open access versionFindings
  • R. Peng, H. Sun, and L. Zanetti. Partitioning well-clustered graphs: Spectral clustering works! SIAM J. Comput., 46(2):710–743, 2017.
    Google ScholarLocate open access versionFindings
  • V. Satuluri and S. Parthasarathy. Symmetrizations for clustering directed graphs. In Proceedings of the 14th International Conference on Extending Database Technology, pages 343–354, 2011.
    Google ScholarLocate open access versionFindings
  • J. Shi and J. Malik. Normalized cuts and image segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 731–737, 1997.
    Google ScholarLocate open access versionFindings
  • D. A. Spielman and N. Srivastava. Graph sparsification by effective resistances. SIAM Journal on Computing, 40(6):1913–1926, 2011.
    Google ScholarLocate open access versionFindings
  • H. Sun and L. Zanetti. Distributed graph clustering and sparsification. ACM Transactions on Parallel Computing, 6(3):17:1–17:23, 2019.
    Google ScholarLocate open access versionFindings
  • United Nations. UN comtrade free API. https://comtrade.un.org/data/. Accessed:2020-06-03.
    Findings
  • U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
    Google ScholarLocate open access versionFindings
  • H. Yin, A. R. Benson, J. Leskovec, and D. F. Gleich. Local higher-order graph clustering. In 23rd International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 555–564, 2017.
    Google ScholarLocate open access versionFindings
  • Z. Zhang, H. Lan, and W. Xing. Global trade pattern of crude oil and petroleum products: Analysis based on complex network. In IOP Conference Series: Earth and Environmental Science, volume 153, pages 22–33. IOP Publishing, 2018.
    Google ScholarLocate open access versionFindings
  • W. Zhong, H. An, X. Gao, and X. Sun. The evolution of communities in the international oil trade network. Physica A: Statistical Mechanics and its Applications, 413:42 – 52, 2014.
    Google ScholarLocate open access versionFindings
作者
Valdimar Steinar Ericsson Laenen
Valdimar Steinar Ericsson Laenen
He Sun
He Sun
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科