Microblog-genre noise and impact on semantic annotation accuracy

Leon Derczynski,Diana Maynard,Niraj Aswani,Kalina Bontcheva

HT（2013）

引用 124|浏览49

暂无评分

摘要

Using semantic technologies for mining and intelligent information access to microblogs is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Semantic annotation of tweets is typically performed in a pipeline, comprising successive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). Consequently, errors are cumulative, and earlier-stage problems can severely reduce the performance of final stages. This paper presents a characterisation of genre-specific problems at each semantic annotation stage and the impact on subsequent stages. Critically, we evaluate impact on two high-level semantic annotation tasks: named entity detection and disambiguation. Our results demonstrate the importance of making approaches specific to the genre, and indicate a diminishing returns effect that reduces the effectiveness of complex text normalisation.

查看译文

关键词

microblog-genre noise,complex text normalisation,semantic annotation stage,high-level semantic annotation task,entity detection,semantic technology,semantic annotation,entity recognition,news text,dynamic nature,semantic annotation accuracy,entity disambiguation,microblog

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要