On The Uniform Concentration Bounds And Large Sample Properties Of Clustering With Bregman Divergences


引用 6|浏览18
Clustering with Bregman divergence has been used in literature to unify centroid-based parametric clustering approaches and to allow the detection of nonspherical clusters within the data. Although empirically useful, the large sample theoretical aspects of Bregman clustering techniques remain largely unexplored. In this paper, we attempt to bridge the gap between the theory and practice of centroid-based Bregman hard clustering by providing uniform deviation bounds on the clustering objective. Our theoretical analysis relies on the famous Vapnik-Chervonenkis (VC) theory, which, although has been extensively used in a supervised learning context, remains largely unexplored in finding empirical risks for unsupervised learning scenarios. As opposed to most of the theoretical works on clustering, our framework allows the number of features (p) to vary with the number of observations (n). The strong consistency of the sample cluster centroids, under standard assumptions in the literature, also follows as a corollary under this general framework. Furthermore, we show that the rate of convergence is at most of the order O(root logn/n), under the standard regularity conditions.
Bregman divergence, clustering, rate of convergence, strong consistency, uniform deviation bounds, VC theory
AI 理解论文