Streaming cross document entity coreference resolution
COLING (Posters)(2010)
摘要
Previous research in cross-document entity coreference has generally been restricted to the offline scenario where the set of documents is provided in advance. As a consequence, the dominant approach is based on greedy agglomerative clustering techniques that utilize pairwise vector comparisons and thus require O(n2) space and time. In this paper we explore identifying coreferent entity mentions across documents in high-volume streaming text, including methods for utilizing orthographic and contextual information. We test our methods using several corpora to quantitatively measure both the efficacy and scalability of our streaming approach. We show that our approach scales to at least an order of magnitude larger data than previous reported methods.
更多查看译文
关键词
offline scenario,previous reported method,dominant approach,approach scale,magnitude larger data,cross-document entity coreference,cross document entity coreference,greedy agglomerative,previous research,contextual information,coreferent entity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络