Low-Resource Corpus Filtering Using Multilingual Sentence Embeddings
WMT (3), pp. 261-266, 2019.
In this paper, we describe our submission to the WMT19 low-resource parallel corpus filtering shared task. Our main approach is based on the LASER toolkit (Language-Agnostic SEntence Representations), which uses an encoder-decoder architecture trained on a parallel corpus to obtain multilingual sentence representations. We then use the ...More
PPT (Upload PPT)