On Optimal Coreset Construction for $(k,z)$-Clustering

arxiv(2023)

引用 0|浏览54
暂无评分
摘要
Constructing small-sized coresets for various clustering problems in different metric spaces has attracted significant attention for the past decade. A central problem in the coreset literature is to understand what is the best possible coreset size for $(k,z)$-clustering in Euclidean space. While there has been significant progress in the problem, there is still a gap between the state-of-the-art upper and lower bounds. For instance, the best known upper bound for $k$-means ($z=2$) is $\min \{O(k^{3/2} \varepsilon^{-2}),O(k \varepsilon^{-4})\}$ [1,2], while the best known lower bound is $\Omega(k\varepsilon^{-2})$ [1]. In this paper, we make significant progress on both upper and lower bounds. For a large range of parameters (i.e., $\varepsilon, k$), we have a complete understanding of the optimal coreset size. In particular, we obtain the following results: (1) We present a new coreset lower bound $\Omega(k \varepsilon^{-z-2})$ for Euclidean $(k,z)$-clustering when $\varepsilon \geq \Omega(k^{-1/(z+2)})$. In view of the prior upper bound $\tilde{O}_z(k \varepsilon^{-z-2})$ [1], the bound is optimal. The new lower bound is surprising since $\Omega(k\varepsilon^{-2})$ [1] is ``conjectured" to be the correct bound in some recent works (see e.g., [1,2]]). (2) For the upper bound, we provide efficient coreset construction algorithms for $(k,z)$-clustering with improved or optimal coreset sizes in several metric spaces. [1] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22. [2] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要