Classification of Short Texts by Deploying Topical Annotations.

Daniele Vitale,Paolo Ferragina,Ugo Scaiella

IR: Research and Development in Information Retrieval（2012）

引用 60|浏览54

暂无评分

摘要

We propose a novel approach to the classification of short texts based on two factors: the use of Wikipedia-based annotators that have been recently introduced to detect the main topics present in an input text, represented via Wikipedia pages, and the design of a novel classification algorithm that measures the similarity between the input text and each output category by deploying only their annotated topics and the Wikipedia link-structure. Our approach waives the common practice of expanding the feature-space with new dimensions derived either from explicit or from latent semantic analysis. As a consequence it is simple and maintains a compact intelligible representation of the output categories. Our experiments show that it is efficient in construction and query time, accurate as state-of-the-art classifiers (see e.g. Phan et al. WWW '08), and robust with respect to concept drifts and input sources.

查看译文

关键词

Latent Semantic Analysis, Concept Drift, Input Text, Short Text, Output Category

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要