Classification of Short Texts by Deploying Topical Annotations.

IR: Research and Development in Information Retrieval(2012)

引用 60|浏览54
暂无评分
摘要
We propose a novel approach to the classification of short texts based on two factors: the use of Wikipedia-based annotators that have been recently introduced to detect the main topics present in an input text, represented via Wikipedia pages, and the design of a novel classification algorithm that measures the similarity between the input text and each output category by deploying only their annotated topics and the Wikipedia link-structure. Our approach waives the common practice of expanding the feature-space with new dimensions derived either from explicit or from latent semantic analysis. As a consequence it is simple and maintains a compact intelligible representation of the output categories. Our experiments show that it is efficient in construction and query time, accurate as state-of-the-art classifiers (see e.g. Phan et al. WWW '08), and robust with respect to concept drifts and input sources.
更多
查看译文
关键词
Latent Semantic Analysis, Concept Drift, Input Text, Short Text, Output Category
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要