Dynamic Shapley Value Computation.

ICDE(2023)

引用 1|浏览34
暂无评分
摘要
With the prevalence of data-driven research, data valuation has attracted attention from the computer science field. How to appraise a single datum becomes an imperative problem, especially in the context of machine learning. Shapley value is widely used to fairly measure the contribution of data points in machine learning since it is the unique definition that satisfies all four desired properties: balance, symmetry, additivity, and zero element. However, computing Shapley value is known to be a #P-hard problem. As data is subject to changes, dynamic data exists pervasively in real-world scenarios. Pricing such dynamic data is more challenging due to the prohibitively expensive cost of recalculation from scratch. In this paper, we study the problem of Dynamic Shapley Value Computation, which updates Shapley value when dynamically adding/deleting data points. For adding data points, to prune unnecessary computation of overlapping model utilities, we propose the pivot-based algorithm that can reduce half computation time in general. We also propose the delta-based algorithm to capture Shapley value changes, which requires a smaller sample size to converge. For deleting data points, we present the YN-NN algorithm that derives the new Shapley value from the data structure of precomputed model utilities in an efficient way. Based on Shapley value changes, we give another version of the delta-based algorithm for deleting data points. Besides, we propose heuristic algorithms to draw on experimental observations for both adding and deleting data points. Extensive experimental results demonstrate the efficiency and effectiveness of our proposed algorithms.
更多
查看译文
关键词
adding deleting data points,computer science field,data structure,data valuation,data-driven research,delta-based algorithm,dynamic data,Dynamic Shapley Value Computation,half computation time,machine learning,Shapley value changes,unnecessary computation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要