On Arabic Stop-Words - A Comprehensive List and a Dedicated Morphological Analyzer.

Communications in Computer and Information Science(2019)

引用 5|浏览0
暂无评分
摘要
Stop-words detection is a key preprocessing step and an important component for many Natural Language Processing applications. For Arabic language, stop-words detection is a complex task due to Arabic morphology richness and to the nonexistence of a commonly accepted list. In this paper, we compile a new comprehensive Arabic stop-words list along a stop-words analyzer that combines that list with a machine-learning-based approach to get the most probable stop-word. The first step in our approach provides a context-free analysis and the most appropriate stop-word according to the sentence context is detected in the second step using the Hidden Markov Model. The developed analyzer evaluation yields to over than 97% of accuracy. This achievement outperforms the state of the art analyzers.
更多
查看译文
关键词
Natural Language Processing,Arabic language,Information retrieval,Stop-words,Hidden Markov Model,Viterbi algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要