Guidelines and Framework for a Large Scale Arabic Diacritized Corpus.

LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION(2016)

引用 25|浏览131
暂无评分
摘要
This paper presents the annotation guidelines developed as part of an effort to create a large scale manually diacritized corpus for various Arabic text genres. The target size of the annotated corpus is 2 million words. We summarize the guidelines and describe issues encountered during the training of the annotators. We also discuss the challenges posed by the complexity of the Arabic language and how they are addressed. Finally, we present the diacritization annotation procedure and detail the quality of the resulting annotations.
更多
查看译文
关键词
Arabic Diacritization,Guidelines,Annotation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要