AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a novel Progressive Scale Expansion Network to successfully detect the text instances with arbitrary shapes in the natural scene images

Shape Robust Text Detection With Progressive Scale Expansion Network

CVPR, pp.9336-9345, (2019)

Cited by: 192|Views242
EI
Full Text
Bibtex
Weibo

Abstract

Scene text detection has witnessed rapid progress especially with the recent development of convolutional neural networks. However, there still exists two challenges which prevent the algorithm into industry applications. On the one hand, most of the state-of-art algorithms require quadrangle bounding box which is in-accurate to locate ...More

Code:

Data:

0
Introduction
  • Scene text detection in the wild is a fundamental problem with numerous applications such as scene understanding, product identification, and autonomous driving.
  • For the regression-based approaches [36, 42, 32, 16, 41, 23, 11, 13, 27], the text targets are usually represented in the forms of rectangles or quadrangles with certain orientations.
  • The regression-based approaches fail to deal with the text instance with arbitrary shapes, e.g., the curve texts as shown in Fig. 1 (b).
  • A false detection which covers all the text instances close to each other may be predicted based on the segmentation-based approach.
Highlights
  • Scene text detection in the wild is a fundamental problem with numerous applications such as scene understanding, product identification, and autonomous driving
  • Many progress has been made in recent years with the rapid development of Convolutional Neural Networks (CNNs) [9, 14, 31]
  • The regression-based approaches fail to deal with the text instance with arbitrary shapes, e.g., the curve texts as shown in Fig. 1 (b)
  • We propose a novel Progressive Scale Expansion Network (PSENet) to successfully detect the text instances with arbitrary shapes in the natural scene images
  • By gradually expanding the detected areas from small kernels to large and complete instances via multiple semantic segmentation maps, our method is robust to shapes and can separate those text instances which are very close or even partially intersected
  • On CTW1500, a dataset full of long curve texts, PSENet achieves a F-measure of 74.3% at 27 FPS, and our best F-measure (82.2%) outperforms state
  • The experiments on scene text detection benchmarks demonstrate the superior performance of the proposed method
Methods
  • The authors first introduce the overall pipeline of the proposed Progressive Scale Expansion Network (PSENet).
  • The authors concatenate low-level texture feature with high-level semantic feature
  • These maps are further fused in F to encode information with various receptive views.
  • The scales of different segmentation mask are decided by the hyper-parameters which will be discussed in Sec. 3.4
  • Among these masks, S1 gives the segmentation result for the text instances with smallest scales and Sn denotes for the original segmentation mask.
Results
  • On CTW1500, a dataset full of long curve texts, PSENet achieves a F-measure of 74.3% at 27 FPS, and the best F-measure (82.2%) outperforms state-.
  • On CTW1500, a dataset with long curve texts, the authors outperform state-of-the-art results by absolute 6.6%, and the real-time model achieves a comparable performance (74.3%) at 27 FPS.
  • With only single scale setting, the method achieves a F-measure of 85.69%, surpassing the state of the art results by more than 3%
Conclusion
  • The authors propose a novel Progressive Scale Expansion Network (PSENet) to successfully detect the text instances with arbitrary shapes in the natural scene images.
  • By gradually expanding the detected areas from small kernels to large and complete instances via multiple semantic segmentation maps, the method is robust to shapes and can separate those text instances which are very close or even partially intersected.
  • The progressive scale expansion algorithm can be introduced to the general instance-level segmentation tasks, especially in those benchmarks with many crowded object instances.
  • The authors are cleaning the codes and will release them soon
Tables
  • Table1: Performance grows with deeper backbones on IC17-MLT
  • Table2: The single-scale results on CTW1500. “P”, “R” and “F”
  • Table3: The single-scale results on Total-Text. “P”, “R” and “F”
  • Table4: The single-scale results on IC15. “P”, “R” and “F” represent the precision, recall and F-measure respectively. “1s” and
  • Table5: The single-scale results on IC17-MLT. “P”, “R” and “F”
  • Table6: Time consumption of PSENet on CTW-1500. The total time is consist of backbone, head of segmentation and PSE part. †
Download tables as Excel
Related work
  • Scene text detection based on deep learning methods have achieved remarkable results over the past few years. A major of modern text detectors are based on CNN framework, in which scene text detection is roughly formulated as two categories: regression-based methods and segmentation-based methods.

    Regression-based methods often based on general object detection frameworks, such Faster R-CNN [31] and SSD [22]. TextBoxes [19] modified the anchor scales and shape of convolution kernels to adjust to the various aspect ratios of the text. EAST [42] use FCN [25] to directly predict score map, rotation angle and text boxes for each pixel. RRPN [28] adopted Faster R-CNN and developed rotation proposals of RPN part to detect arbitrary oriented text. RRD [20] extracted feature maps for text classification and regression from two separately branches to better long text detection.
Funding
  • This work is supported by the Natural Science Foundation of China under Grant 61672273 and Grant 61832008, the Science Foundation for Distinguished Young Scholars of Jiangsu under Grant BK20160021, and Scientific Foundation of State Grid Corporation of China (Research on Icewind Disaster Feature Recognition and Prediction by Fewshot Machine Learning in Transmission Lines)
Reference
  • Icdar2017 competition on multi-lingual scene text detection and script identification. http://rrc.cvc.uab.es/?ch=8&com=introduction.
    Findings
  • Chee Kheng Ch’ng and Chee Seng Chan. Total-text: A comprehensive dataset for scene text detection and recognition. In ICDAR, 2017.
    Google ScholarLocate open access versionFindings
  • Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. A tutorial on the cross-entropy method. Annals of Operations Research, 2005.
    Google ScholarLocate open access versionFindings
  • Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Pixellink: Detecting scene text via instance segmentation. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In ICAIS, 2011.
    Google ScholarLocate open access versionFindings
  • Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. Synthetic data for text localisation in natural images. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, 2015.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In ECCV, 2016.
    Google ScholarLocate open access versionFindings
  • Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, and Xiaolin Li. Single shot text detector with regional attention. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Wenhao He, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu. Deep direct regression for multi-oriented scene text detection. ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Han Hu, Chengquan Zhang, Yuxuan Luo, Yuzhuo Wang, Junyu Han, and Errui Ding. Wordsup: Exploiting word annotations for character based text detection. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. Densely connected convolutional networks. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
    Findings
  • Yingying Jiang, Xiangyu Zhu, Xiaobing Wang, Shuli Yang, Wei Li, Hua Wang, Pei Fu, and Zhenbo Luo. R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579, 2017.
    Findings
  • Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu, et al. Icdar 2015 competition on robust reading. In ICDAR, 2015.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.
    Google ScholarLocate open access versionFindings
  • Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. Textboxes: A fast text detector with a single deep neural network. In AAAI, 2017.
    Google ScholarLocate open access versionFindings
  • Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-song Xia, and Xiang Bai. Rotation-sensitive regression for oriented scene text detection. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In ECCV, 2016.
    Google ScholarLocate open access versionFindings
  • Xuebo Liu, Ding Liang, Shi Yan, Dagui Chen, Yu Qiao, and Junjie Yan. Fots: Fast oriented text spotting with a unified network. arXiv preprint arXiv:1801.01671, 2018.
    Findings
  • Yuliang Liu, Lianwen Jin, Shuaitao Zhang, and Sheng Zhang. Detecting curve text in the wild: New dataset and new solution. 2017.
    Google ScholarFindings
  • Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, and Cong Yao. Textsnake: A flexible representation for detecting text of arbitrary shapes. ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, and Xiang Bai. Multi-oriented scene text detection via corner localization and region segmentation. arXiv preprint arXiv:1802.08948, 2018.
    Findings
  • Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, and Xiangyang Xue. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018.
    Google ScholarLocate open access versionFindings
  • Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In IC3DV, 2016.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
    Google ScholarFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, 2015.
    Google ScholarLocate open access versionFindings
  • Baoguang Shi, Xiang Bai, and Serge Belongie. Detecting oriented text in natural images by linking segments. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 2017.
    Google ScholarLocate open access versionFindings
  • Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. Training region-based object detectors with online hard example mining. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep learning. In ICML, 2013.
    Google ScholarFindings
  • Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao. Detecting text in natural image with connectionist text proposal network. In ECCV, 2016.
    Google ScholarLocate open access versionFindings
  • Bala R Vatti. A generic solution to polygon clipping. Communications of the ACM, 1992.
    Google ScholarLocate open access versionFindings
  • Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, and Guangyao Li. Scene text detection with supervised pyramid context network. arXiv preprint arXiv:1811.08605, 2018.
    Findings
  • Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, and Zhimin Cao. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002, 2016.
    Findings
  • Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, and Xiang Bai. Multi-oriented text detection with fully convolutional networks. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, and Ziyong Feng. Deeptext: A unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314, 2016.
    Findings
  • Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. East: an efficient and accurate scene text detector. arXiv preprint arXiv:1704.03155, 2017.
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科