Data Acquisition via Experimental Design for Decentralized Data Markets
arxiv(2024)
摘要
Acquiring high-quality training data is essential for current machine
learning models. Data markets provide a way to increase the supply of data,
particularly in data-scarce domains such as healthcare, by incentivizing
potential data sellers to join the market. A major challenge for a data buyer
in such a market is selecting the most valuable data points from a data seller.
Unlike prior work in data valuation, which assumes centralized data access, we
propose a federated approach to the data selection problem that is inspired by
linear experimental design. Our proposed data selection method achieves lower
prediction error without requiring labeled validation data and can be optimized
in a fast and federated procedure. The key insight of our work is that a method
that directly estimates the benefit of acquiring data for test set prediction
is particularly compatible with a decentralized market setting.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要