High Performance Offline Handwritten Chinese Text Recognition with a New Data Preprocessing and Augmentation Pipeline

Canyu Xie,Songxuan Lai,Qianying Liao,Lianwen Jin

DAS（2020）

引用 12|浏览35

暂无评分

摘要

Offline handwritten text recognition (HCTR) has been a long-standing research topic. To build robust and high-performance offline HCTR systems, it is natural to develop data preprocessing and augmentation techniques, which, however, have not been fully explored. In this paper, we propose a data preprocessing and augmentation pipeline and a CNN-ResLSTM model for high-performance offline HCTR. The data preprocessing and augmentation pipeline consists of three steps: training text sample generation, text sample preprocessing and text sample synthesis. The CNN-ResLSTM model is derived by introducing residual connections into the RNN part of the CRNN architecture. Experiments show that on the proposed CNN-ResLSTM, the data preprocessing and augmentation pipeline can effectively and robustly improve the system performance: On two standard benchmarks, namely the CASIA-HWDB and the ICDAR-2013 handwriting competition dataset, the proposed approach achieves state-of-the-art results with correct rates of 97.28% and 96.99%, respectively. Furthermore, to make our model more practical, we employ model acceleration and compression techniques to build a fast and compact model without sacrificing the accuracy.

查看译文

关键词

Offline Handwritten Text Recognition (HCTR), Data preprocessing, Data augmentation, CNN-ResLSTM

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要