U-Net Based Architectures For Document Text Detection And Binarization

ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT II(2019)

引用 3|浏览0
暂无评分
摘要
With the increasing popularity of document analysis and recognition systems, text detection (TD) and text binarization (TB) in document images remain challenging tasks. In the paper, we introduced a two-step architecture for the TD task. Firstly, a U-net based model is used to get a text mask in terms of word-level bounding boxes. Secondly, we approximate the mask of the bounding boxes with rectangles using a classic computer vision method. The model achieves state-of-the-art result on document images and outperforms other popular approaches. Moreover, we introduce the Hybrid U-net architecture, which helps to solve the TB and TD problems at the same time. The model demonstrates high results on both problems. The shared convolution encoder allows to reduce the number of parameters and consumed memory compared to separate models without reducing the model performance.
更多
查看译文
关键词
Document processing, Document recognition, Pattern recognition, Semantic segmentation, Text detection, Text binarization, U-net
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要