LAR-WordNet: A Machine-Translated, Pan-Hispanic and Regional WordNet for Spanish.

ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2018(2018)

引用 1|浏览19
暂无评分
摘要
WordNet is one of the most used resources in Natural Language Processing (NLP). However, the only WordNet available for Spanish is mainly representative of Spain and its size is approximately 50 % compared to Princeton's WordNet in English. To address these issues, we automatically translate the Princeton version using lemmas and sentences from all the available corpora annotated with WordNet senses (LAS-WordNet). In addition, we enrich the translated version using lexicons that contain Pan-Hispanic regionalisms extracted from Twitter (LAR-WordNet). The proposed resources were evaluated in the task of Semantic Textual Similarity in Spanish and cross-lingual between Spanish and English. The results showed that LAS-WordNet significantly outperformed the current Spanish WordNet and that the regionalisms added to LAR-WordNet do not hinder its performance. Although the proposed resources are noisier than the current WordNet in Spanish, their size and representativeness make them suitable for many NLP applications.
更多
查看译文
关键词
Spanish WordNet,Machine-translated WordNet WordNet,Semantic textual similarity,Cross-lingual textual similarity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要