CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models Using Synthetic Back-Translation Data
CoRR(2024)
摘要
Neural Machine Translation (NMT) for low-resource languages is still a
challenging task in front of NLP researchers. In this work, we deploy a
standard data augmentation methodology by back-translation to a new language
translation direction Cantonese-to-English. We present the models we fine-tuned
using the limited amount of real data and the synthetic data we generated using
back-translation including OpusMT, NLLB, and mBART. We carried out automatic
evaluation using a range of different metrics including lexical-based and
embedding-based. Furthermore. we create a user-friendly interface for the
models we included in this CantonMT research project and make it
available to facilitate Cantonese-to-English MT research. Researchers can add
more models into this platform via our open-source CantonMT toolkit
.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要