On the Use of Neural Text Generation for the Task of Optical Character Recognition

2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA)(2019)

引用 0|浏览0
暂无评分
摘要
Optical Character Recognition (OCR), is extraction of textual data from scanned text documents to facilitate their indexing, searching, editing and to reduce storage space. Although OCR systems have improved significantly in recent years, they still suffer in situations where the OCR output does not match the text in the original document. Deep learning models have contributed positively to many problems but their full potential to many other problems are yet to be explored. In this paper we propose a post-processing approach based on the application deep learning to improve the accuracy of OCR system (minimizing the error rate). We report on the use of neural network language models to accomplish the task of correcting incorrectly predicted characters/words by OCR systems. We applied our approach to the IAM handwriting database. Our proposed approach delivers significant accuracy improvement of 20.41% in F-score, 10.86% in character level comparison using Levenshtein distance and 20.69% in document level comparison over previously reported context based OCR empirical results of IAM handwriting database.
更多
查看译文
关键词
Neural text generation,Optical character recognition,OCR,OCR post-processing,language models,neural language model,text generation,text prediction,IAM database,handwritten character recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要