Arabic Diacritics Restoration Using Maximum Entropy Language Models.

Hossam Shamardan,Yasser Hifny

IEEE Signal Process. Lett.(2023)

引用 0|浏览0
暂无评分
摘要
Restoring the input text's diacritics is crucial in developing Arabic text-to-speech systems. A few rules can be used to implement phonetic transcription when diacritics are present. The preferred method for recovering Arabic diacritics is based on scoring $n$-gram language models. By utilizing maximum entropy (MaxEnt) language models, a novel procedure is presented in this work for restoring Arabic diacritics. The MaxEnt language models are used to score the diacritized words given previous words in a given sequence. Hence, dynamic programming (DP) efficient decoder retrieves the most probable word sequence conditioned on the input undiacritized word sequence. The Tashkeela corpus was used to evaluate the effectiveness of the suggested technique. Our methods were more efficient than the most sophisticated algorithms available at the time.
更多
查看译文
关键词
arabic diacritics restoration,maximum entropy language models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要