Fair k-Center Clustering in MapReduce and Streaming Settings

International World Wide Web Conference(2022)

引用 5|浏览49
暂无评分
摘要
ABSTRACT Center-based clustering techniques are fundamental to many real-world applications such as data summarization and social network analysis. In this work, we study the problem of fairness aware k-center clustering over large datasets. We are given an input dataset comprising a set of n points, where each point belongs to a specific demographic group characterized by a protected attribute, such as race or gender. The goal is to identify k clusters such that all clusters have considerable representation from all groups and the maximum radius of these clusters is minimized. The majority of the prior techniques do not scale beyond 100K points for k = 50. To address the scalability challenges, we propose an efficient 2-round algorithm for the MapReduce setting that is guaranteed to be a 9-approximation to the optimal solution. Additionally, we develop a 2-pass streaming algorithm that is efficient and has a low memory footprint. These theoretical results are complemented with an empirical evaluation on million-scale datasets, demonstrating that our techniques are effective to identify high-quality fair clusters and efficient as compared to the state-of-the-art.
更多
查看译文
关键词
fairness, k-center clustering, disparate impact
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要