Synthetic Error Dataset Generation Mimicking Bengali Writing Pattern

Sifat Md. Habibur Rahman,Rahman Chowdhury Rafeed,Rafsan Mohammad,Rahman Md. Hasibur

2020 IEEE Region 10 Symposium (TENSYMP)（2020）

引用 0|浏览4

暂无评分

摘要

While writing Bengali using English keyboard, users often make spelling mistakes. The accuracy of any Bengali spell checker or paragraph correction module largely depends on the kind of error dataset it is based on. Manual generation of such error dataset is a cumbersome process. In this research, We present an algorithm for automatic misspelled Bengali word generation from correct word through analyzing Bengali writing pattern using QWERTY layout English keyboard. As part of our analysis, we have formed a list of most commonly used Bengali words, phonetically similar replaceable clusters, frequently mispressed replaceable clusters, frequently mispressed insertion prone clusters and some rules for Juktakkhar (constant letter clusters) handling while generating errors.

查看译文

关键词

Bengali error dataset, Phonetically similar, Constant cluster, Spell checker

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要