An experimental comparison of explicit semantic analysis implementations for cross-language retrieval

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS(2010)

引用 35|浏览0
暂无评分
摘要
Explicit Semantic Analysis (ESA) has been recently proposed as an approach to computing semantic relatedness between words (and indirectly also between texts) and has thus a natural application in information retrieval, showing the potential to alleviate the vocabulary mismatch problem inherent in standard Bag-of-Word models. The ESA model has been also recently extended to cross-lingual retrieval settings, which can be considered as an extreme case of the vocabulary mismatch problem. The ESA approach actually represents a class of approaches and allows for various instantiations. As our first contribution, we generalize ESA in order to clearly show the degrees of freedom it provides. Second, we propose some variants of ESA along different dimensions, testing their impact on performance on a cross-lingual mate retrieval task on two datasets (JRC-ACQUIS and Multext). Our results are interesting as a systematic investigation has been missing so far and the variations between different basic design choices are significant. We also show that the settings adopted in the original ESA implementation are reasonably good, which to our knowledge has not been demonstrated so far, but can still be significantly improved by tuning the right parameters (yielding a relative improvement on a cross-lingual mate retrieval task of between 62% (Multext) and 237% (JRC-ACQUIS) with respect to the original ESA model).
更多
查看译文
关键词
vocabulary mismatch problem,original esa model,experimental comparison,cross-lingual mate retrieval task,explicit semantic analysis implementation,cross-language retrieval,different dimension,different basic design choice,information retrieval,original esa implementation,esa model,esa approach,retrieval setting,degree of freedom,explicit semantic analysis,computational semantics,bag of words
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要