TODQA: Efficient Task-Oriented Data Quality Assessment
2019 15th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN)(2019)
摘要
Data quality assessment is vital for many information services ranging from sensor networks to smart city systems. The current data quality assessments, however, are often derived from intrinsic data characteristics, disconnected from specific application contexts, or are not applicable or efficient for large datasets. In this work, we propose a novel task-oriented data quality assessment framework, which balances between the intrinsic and contextual quality. We carefully craft the assessment metrics, quantify them, and fuse them to rank candidate datasets by quality given specific tasks. To improve the system efficiency, two fast calculation algorithms are designed to quantify the relationship between datasets and the task, and the distribution of data items. We conduct extensive evaluations on six public image datasets (with 460, 247 images in total) and four text document datasets (with 37, 372 documents in total) to evaluate the efficacy and efficiency of our design. Experimental results show that our algorithms can save about 90% computing time with little accuracy loss which validates the feasibility and effectiveness of our framework for large datasets.
更多查看译文
关键词
Data quality assessment,Sampling,Locality sensitive hashing,Rank aggregation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络