Shamela: A Large-Scale Historical Arabic Corpus
LT4DH@COLING, pp. 45-53, 2016.
Arabic is a widely-spoken language with a rich and long history spanning more than fourteen centuries. Yet existing Arabic corpora largely focus on the modern period or lack sufficient diachronic information. We develop a large-scale, historical corpus of Arabic of about 1 billion words from diverse periods of time. We clean this corpus, ...More
PPT (Upload PPT)