High Performance Offline Handwritten Chinese Text Recognition with a New Data Preprocessing and Augmentation Pipeline

DAS(2020)

引用 12|浏览35
暂无评分
摘要
Offline handwritten text recognition (HCTR) has been a long-standing research topic. To build robust and high-performance offline HCTR systems, it is natural to develop data preprocessing and augmentation techniques, which, however, have not been fully explored. In this paper, we propose a data preprocessing and augmentation pipeline and a CNN-ResLSTM model for high-performance offline HCTR. The data preprocessing and augmentation pipeline consists of three steps: training text sample generation, text sample preprocessing and text sample synthesis. The CNN-ResLSTM model is derived by introducing residual connections into the RNN part of the CRNN architecture. Experiments show that on the proposed CNN-ResLSTM, the data preprocessing and augmentation pipeline can effectively and robustly improve the system performance: On two standard benchmarks, namely the CASIA-HWDB and the ICDAR-2013 handwriting competition dataset, the proposed approach achieves state-of-the-art results with correct rates of 97.28% and 96.99%, respectively. Furthermore, to make our model more practical, we employ model acceleration and compression techniques to build a fast and compact model without sacrificing the accuracy.
更多
查看译文
关键词
Offline Handwritten Text Recognition (HCTR), Data preprocessing, Data augmentation, CNN-ResLSTM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要