Lemmatizing and POS-tagging Akkadian with BabyLemmatizer and Dictionary-Based Post-Correction.

CLARIN Annual Conference(2022)

引用 0|浏览1
暂无评分
摘要
We present BabyLemmatizer, a hybrid lemmatizer and POS-tagger for Akkadian, the language of the ancient Assyrians and Babylonians, documented from 2350 BCE to 100 CE. In our approach the text is first POS-tagged and lemmatized with TurkuNLP trained with human-verified labels, and then post-corrected with dictionary-based methods to improve the lemmatization quality. The post-correction also assigns labels with confidence scores to flag the most suspicious lemmatizations for manual validation. We demonstrate that the presented tool achieves a Lemma+POS labeling accuracy of 94%, and a lemmatization accuracy of 95% in a held-out test set. We also apply lemmatizer to a previously unlemmatized text corpus to test it in practice.
更多
查看译文
关键词
babylemmatizer,lemmatizing,pos-tagging,dictionary-based,post-correction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要