Chrome Extension
WeChat Mini Program
Use on ChatGLM

KC4MT: A High-Quality Corpus for Multilingual Machine Translation

Van-Vinh Nguyen,Ha Nguyen-Tien, Huong Le-Thanh,Phuong-Thai Nguyen,Van-Tan Bui,Nghia-Luan Pham,Tuan-Anh Phan, Minh-Cong Nguyen Hoang, Hong-Viet Tran, Huu-Anh Tran

LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION(2022)

Cited 0|Views5
No score
Abstract
The multilingual parallel corpus is an important resource for many applications of natural language processing (NLP). For machine translation, the size and quality of the training corpus mainly affects the quality of the translation models. In this work, we present the method for building high-quality multilingual parallel corpus in the news domain and for some low-resource languages, including Vietnamese, Laos, and Khmer, to improve the quality of multilingual machine translation in these areas. We also publicized this one that includes 500:000 Vietnamese-Chinese bilingual sentence pairs; 150:000 Vietnamese-Laos bilingual sentence pairs, and 150:000 Vietnamese-Khmer bilingual sentence pairs.
More
Translated text
Key words
Multilingual parallel corpus,low-resource languages,language resource,parallel corpus,machine translation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined