Got Many Labels? Deriving Topic Labels From Multiple Sources For Social Media Posts Using Crowdsourcing And Ensemble Learning

WWW '15: 24th International World Wide Web Conference Florence Italy May, 2015(2015)

引用 16|浏览88
暂无评分
摘要
Online search and item recommendation systems are often based on being able to correctly label items with topical keywords. Typically, topical labelers analyze the main text associated with the item, but social media posts are often multimedia in nature and contain contents beyond the main text. Topic labeling for social media posts is therefore an important open problem for supporting effective social media search and recommendation.In this work, we present a novel solution to this problem for Google+ posts, in which we integrated a number of different entity extractors and annotators, each responsible for a part of the post (e.g. text body, embedded picture, video, or web link). To account for the varying quality of different annotator outputs, we first utilized crowdsourcing to measure the accuracy of individual entity annotators, and then used supervised machine learning to combine different entity annotators based on their relative accuracy. Evaluating using a ground truth data set, we found that our approach substantially outperforms topic labels obtained from the main text, as well as naive combinations of the individual annotators. By accurately applying topic labels according to their relevance to social media posts, the results enables better search and item recommendation.
更多
查看译文
关键词
Crowdsourcing,Topic annotator
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要