Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia
Proceedings of the First Workshop in South East Asian Language Processing(2023)
摘要
Neural machine translation (NMT) for low-resource local languages in
Indonesia faces significant challenges, including the need for a representative
benchmark and limited data availability. This work addresses these challenges
by comprehensively analyzing training NMT systems for four low-resource local
languages in Indonesia: Javanese, Sundanese, Minangkabau, and Balinese. Our
study encompasses various training approaches, paradigms, data sizes, and a
preliminary study into using large language models for synthetic low-resource
languages parallel data generation. We reveal specific trends and insights into
practical strategies for low-resource language translation. Our research
demonstrates that despite limited computational resources and textual data,
several of our NMT systems achieve competitive performances, rivaling the
translation quality of zero-shot gpt-3.5-turbo. These findings significantly
advance NMT for low-resource languages, offering valuable guidance for
researchers in similar contexts.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要