Low-Resource Corpus Filtering Using Multilingual Sentence Embeddings

WMT (3), pp. 261-266, 2019.

Cited by: 0|Bibtex|Views58|DOI:https://doi.org/10.18653/v1/w19-5435
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org

Abstract:

In this paper, we describe our submission to the WMT19 low-resource parallel corpus filtering shared task. Our main approach is based on the LASER toolkit (Language-Agnostic SEntence Representations), which uses an encoder-decoder architecture trained on a parallel corpus to obtain multilingual sentence representations. We then use the ...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments