Fully Scalable MPC Algorithms for Clustering in High Dimension

Artur Czumaj, Guichen Gao,Shaofeng H. -C. Jiang,Robert Krauthgamer,Pavel Veselý

CoRR（2023）

引用 0|浏览20

暂无评分

摘要

We design new algorithms for $k$-clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine is $n^{\sigma}$ for arbitrarily small fixed $\sigma>0$. Importantly, the local memory may be substantially smaller than $k$. Our algorithms take $O(1)$ rounds and achieve $O(1)$-bicriteria approximation for $k$-Median and for $k$-Means, namely, they compute $(1+\varepsilon)k$ clusters of cost within $O(1/\varepsilon^2)$-factor of the optimum. Previous work achieves only $\mathrm{poly}(\log n)$-bicriteria approximation [Bhaskara et al., ICML'18], or handles a special case [Cohen-Addad et al., ICML'22]. Our results rely on an MPC algorithm for $O(1)$-approximation of facility location in $O(1)$ rounds. A primary technical tool that we develop, and may be of independent interest, is a new MPC primitive for geometric aggregation, namely, computing certain statistics on an approximate neighborhood of every data point, which includes range counting and nearest-neighbor search. Our implementation of this primitive works in high dimension, and is based on consistent hashing (aka sparse partition), a technique that was recently used for streaming algorithms [Czumaj et al., FOCS'22].

查看译文

关键词

scalable mpc algorithms,clustering,high dimension

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要