Shamela: A Large-Scale Historical Arabic Corpus

Alexander Magidow
Alexander Magidow
Maxim Romanov
Maxim Romanov

LT4DH@COLING, pp. 45-53, 2016.

Cited by: 14|Views11
EI

Abstract:

Arabic is a widely-spoken language with a rich and long history spanning more than fourteen centuries. Yet existing Arabic corpora largely focus on the modern period or lack sufficient diachronic information. We develop a large-scale, historical corpus of Arabic of about 1 billion words from diverse periods of time. We clean this corpus, ...More

Code:

Data:

Your rating :
0

 

Tags
Comments