Translation of Multiword Expressions Using Parallel Suffix Arrays

AMTA(2008)

引用 27|浏览39
暂无评分
摘要
Accurately translating multiword expres- sions is important to obtain good per- formance in machine translation, cross- language information retrieval, and other multilingual tasks in human language technology. Existing approaches to induc- ing translation equivalents of multiword units have focused on agglomerating in- dividual words or on aligning words in a statistical machine translation system. We present a different approach based upon information theoretic heuristics and the exact counting of frequencies of occur- rence of multiword strings in aligned par- allel corpora. We are applying a technique introduced by Yamamoto and Church that uses suffix arrays and longest common prefix arrays. Evaluation of the method in multiple language pairs was performed using bilingual lexicons of domain- specific terminology as a gold standard. We found that performance of 50-70%, as measured by mean reciprocal rank, can be obtained for terms that occur more than 10 or so times.
更多
查看译文
关键词
gold standard,machine translation,mean reciprocal rank,human language technology
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要