WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

Sun Shuo
Sun Shuo
Gong Hongyu
Gong Hongyu
Guzmán Francisco
Guzmán Francisco
Cited by: 0|Bibtex|Views64
Other Links: arxiv.org

Abstract:

We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of WikiPedia articles in 85 languages, including several dialects or low-resource languages. We do not limit the the extraction process to alignments with English, but systematically consider all possible languag...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments