Use of Web Popularity on Entity Centric Document Filtering.

WI-IAT（2015）

引用 0|浏览11

暂无评分

摘要

Filtering pages about an entity (person, company, music band...) so that only interesting pages are kept is a real challenge. The interest can be qualified using criteria such as recency, novelty. In the last decade, we have seen classification systems trained to detect the interest for a document regarding an entity. For scalability reasons, it is not possible to consider a manual annotation of a training set for each tracked entity. Some approaches strive to build entity independent systems. These approaches obtain the state of the art performances, but we show that they can be improved. Time features differ from one entity to another, therefore no relevant statistics can be estimated out of these observations by a single classifier. Instead of having one model per entity or one model for all entities, we propose an approach that uses one model per cluster of entities based on the entity web popularity. We also introduce different strategies for automatic classification model selection. We test our approach on the Knowledge Base Acceleration (KBA) framework from TREC and we show that our approach brings significant improvements over a non-cluster-based method.

查看译文

关键词

entity popularity,filtering,KBA,classification

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要