Low-Resource Corpus Filtering Using Multilingual Sentence Embeddings
WMT (3), pp. 261-266, 2019.
EI
Abstract:
In this paper, we describe our submission to the WMT19 low-resource parallel corpus filtering shared task. Our main approach is based on the LASER toolkit (Language-Agnostic SEntence Representations), which uses an encoder-decoder architecture trained on a parallel corpus to obtain multilingual sentence representations. We then use the ...More
Code:
Data:
Full Text
Tags
Comments