Twitter Normalization Via 1-To-N Recovering

Yafeng Ren, Jiayuan Deng,Donghong Ji

WEB INFORMATION SYSTEMS ENGINEERING - WISE 2016, PT I(2016)

引用 2|浏览23
暂无评分
摘要
Twitter messages are written in an informal style, which hinders many information retrieval and natural language processing applications. Existing normalization systems have two major drawbacks. The first is that these methods largely require large-scale annotated training data. The second is that these systems assume that a nonstandard token is recovered to one standard word. However, there are many nonstandard tokens that should be recovered to two or more standard words, so the problem remains to be highly challenging. To address the above issues, we propose an unsupervised normalization system based on the context similarity. The proposed system does not require any annotated data. Meanwhile, a nonstandard token will be recovered to one or more standard words. Results show that the proposed approach achieves state-of- the-art performance.
更多
查看译文
关键词
Twitter normalization,Forward search,Random walk,Spell checker
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要