WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
Abstract:
We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of WikiPedia articles in 85 languages, including several dialects or low-resource languages. We do not limit the the extraction process to alignments with English, but systematically consider all possible languag...More
Code:
Data:
Full Text
Tags
Comments