Hierarchical Agglomerative Graph Clustering In Nearly-Linear Time

ICML 2021(2021)

引用 21|浏览37
暂无评分
摘要
We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs. We define an algorithmic framework for hierarchical agglomerative graph clustering that provides the first efficient (O) over tilde (m) time exact algorithms for classic linkage measures, such as complete- and WPGMA-linkage, as well as other measures. Furthermore, for average-linkage, arguably the most popular variant of HAC, we provide an algorithm that runs in (O) over tilde (n root m) time. For this variant, this is the first exact algorithm that runs in subquadratic time, as long as m = n(2-epsilon) for some constant epsilon > 0. We complement this result with a simple epsilon-close approximation algorithm for average-linkage in our framework that runs in (O) over tilde (m) time. As an application of our algorithms, we consider clustering points in a metric space by first using k-NN to generate a graph from the point set, and then running our algorithms on the resulting weighted graph. We validate the performance of our algorithms on publicly available datasets, and show that our approach can speed up clustering of point datasets by a factor of 20.7-76.5x.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要