Robust Crowd Bias Correction Via Dual Knowledge Transfer From Multiple Overlapping Sources

Sihong Xie,Qingbo Hu,Jingyuan Zhang,Jing Gao,Wei Fan,Philip S. Yu

2015 IEEE International Conference on Big Data (Big Data)（2015）

引用 8|浏览71

暂无评分

摘要

One of the largest constituents of big data is the crowdsourced or user-generated data which contain a wide range of valuable information. However, they are inherently biased and possibly spammed, making trustworthy information extraction an imperative task. As a special case, we study reviewer-posted ratings for products. The biased ratings can lead to disappointed customers due to overrated products, and reduced revenues of business owners caused by undeserved negative ratings. To distill objective product quality measurements, most existing methods try to infer unbiased ratings from the raw ratings alone, and may not overcome the inherent bias to recover the underlying true ratings. Though improved bias corrections have been achieved with domain expert helps, the overhead of expert efforts can be rather expensive in practice. We exploit the variety of big data and adopt a multiple source mining approach, which finds trustworthy measurements without domain expert, but with knowledge crowdsourced and transferred from external domains. We address the challenges that the multiple data sources are 1) inherently heterogeneous, 2) at most only partially overlapping and 3) biased by themselves. We explore and analyze the strengths and weaknesses of various knowledge transfer strategies. We then propose Consensus Ranking Dual Transfer (CRDT) to handle the above challenges by identifying "anchor reviewers" as a bridge for robust "dual transfer", and removing bias in individual sources via consensus ranking aggregation. Experiments on real-world rating datasets demonstrate that the proposed approach can deliver more robust bias correcting effects than the baselines and can identify abnormal reviewers.

查看译文

关键词

robust crowd bias correction,dual knowledge transfer,multiple overlapping sources,big data,crowdsourcing data,user-generated data,trustworthy information extraction,product reviewer-posted ratings,biased ratings,objective product quality measurement,unbiased rating inference,true rating recovery,multiple source mining approach,trustworthy measurement,knowledge crowdsourcing,consensus ranking dual transfer,CRDT,consensus ranking aggregation,abnormal reviewer identification

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要