Nesm: A Named Entity Based Proximity Measure For Multilingual News Clustering

PROCESAMIENTO DEL LENGUAJE NATURAL(2012)

引用 26|浏览11
暂无评分
摘要
Measuring the similarity between documents is an essential task in Document Clustering. This paper presents a new metric that is based on the number and the category of the Named Entities shared between news documents. Three different feature-weighting functions and two standard similarity measures were used to evaluate the quality of the proposed measure in multilingual news clustering. The results, with three di ff erent collections of comparable news written in English and Spanish, indicate that the new metric performance is in some cases better than standard similarity measures such as cosine similarity and correlation coefficient.
更多
查看译文
关键词
Named Entity, Multilingual Clustering, Document Similarity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要