Chrome Extension
WeChat Mini Program
Use on ChatGLM

Streaming Algorithms for Diversity Maximization with Fairness Constraints

IEEE International Conference on Data Engineering (ICDE)(2022)CCF A

School of Data Science and Engineering | Department of Information and Communication Technologies | Department of Computer Science

Cited 6|Views49
Abstract
Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set $X$ of $n$ elements, it asks to select a subset $S$ of $k\ll n$ elements with maximum diversity, as quantified by the dissimilarities among the elements in S. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the max-min diversity objective, which selects a subset $S$ that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set $X$ is partitioned into $m$ disjoint groups by some sensitive attribute, e.g., sex or race, ensuring fairness requires that the selected subset $S$ contains k i elements from each group i є [1, m]. A streaming algorithm should process $X$ sequentially in one pass and return a subset with maximum diversity while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the max-min diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NP-hard in general, we propose two approximation algorithms for fair diversity maximization in data streams, the first of which is $\frac{1-\varepsilon}{4}$ -approximate and specific for m = 2, where є E (0,1), and the second of which achieves a $\frac{1-\varepsilon}{3m+2}$ -approximation for an arbitrary $m$ . Experimental results on real-world and synthetic datasets show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms while running several orders of magnitude faster in the streaming setting.
More
Translated text
Key words
algorithmic fairness,diversity maximization,max-min dispersion,streaming algorithm
PDF
Bibtex
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文针对具有公平性约束的多样性最大化问题,在流数据场景下提出了两种近似算法,能够在保证公平性的同时实现高效的数据多样性选择。

方法】:研究采用了max-min多样性目标,即选择一个子集使得子集内任意两个不同元素之间的最小距离最大化,并提出了两种近似算法,分别适用于两组和任意多组的情况。

实验】:实验在真实世界和合成数据集上进行,结果显示,提出的两种算法在流数据场景下运行速度比现有最佳算法快几个数量级,同时解决方案的质量与之相当。