Bilingual News Clustering Using Named Entities And Fuzzy Similarity
TSD'07: Proceedings of the 10th international conference on Text, speech and dialogue(2007)
摘要
This paper is focused on discovering bilingual news clusters in a comparable corpus. Particularly, we deal with the news representation and with the calculation of the similarity between documents. We use as representative features of the news the cognate named entities they contain. One of our main goals consists of proving whether the use of only named entities is a good source of knowledge for multilingual news clustering. In the vectorial news representation we take into account the category of the named entities. In order to determine the similarity between two documents, we propose a new approach based on a fuzzy system, with a knowledge base that tries to incorporate the human knowledge about the importance of the named entities category in the news. We have compared our approach with a traditional one obtaining better results in a comparable corpus with news in Spanish and English.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络