Improving neural machine translation with POS-tag features for low-resource language pairs.

Heliyon(2022)

引用 8|浏览0
暂无评分
摘要
Integrating linguistic features has been widely utilized in statistical machine translation (SMT) systems, resulting in improved translation quality. However, for low-resource languages such as Thai and Myanmar, the integration of linguistic features in neural machine translation (NMT) systems has yet to be implemented. In this study, we propose transformer-based NMT models (transformer, multi-source transformer, and shared-multi-source transformer models) using linguistic features for two-way translation of Thai-to-Myanmar, Myanmar-to-English, and Thai-to-English. Linguistic features such as part-of-speech (POS) tags or universal part-of-speech (UPOS) tags are added to each word on either the source or target side, or both the source and target sides, and the proposed models are conducted. The multi-source transformer and shared-multi-source transformer models take two inputs (i.e., string data and string data with POS tags) and produce string data or string data with POS tags. A transformer model that utilizes only word vectors was used as the first baseline model for comparison with the proposed models. The second baseline model, an Edit-Based Transformer with Repositioning (EDITOR) model, was also used to compare with our proposed models in addition to the baseline transformer model. The findings of the experiments show that adding linguistic features to the transformer-based models enhances the performance of a neural machine translation in low-resource language pairs. Moreover, the best translation results were yielded using shared-multi-source transformer models with linguistic features resulting in more significant Bilingual Evaluation Understudy (BLEU) scores and character n-gram F-score (chrF) scores than the baseline transformer and EDITOR models.
更多
查看译文
关键词
Linguistic features,Neural machine translation,Part-of-speech,Transformer architecture,Universal part-of-speech
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要