AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We have presented a shape robust text detector that can detect text with arbitrary shapes
Scene Text Detection with Supervised Pyramid Context Network.
national conference on artificial intelligence, (2019)
Scene text detection methods based on deep learning have achieved remarkable results over the past years. However, due to the high diversity and complexity of natural scenes, previous state-of-the-art text detection methods may still produce a considerable amount of false positives, when applied to images captured in real-world environmen...More
PPT (Upload PPT)
- Reading text in the wild, as a fundamental task in the field of computer vision, has been widely studied.
- Most previous works mainly focus on several challenging issues in natural scene text detection, such as multi-oriented text (Lyu et al 2018b), large aspect ratios (Liao et al 2018), and difficulty in separating adjacent text instances (Deng et al 2018).
- The first challenge is false positives (FP)
- Some specific scenarios such as autonomous driving require high precision in text detection.
- TextSnake (Long et al 2018) uses ordered disks to represent curved text, but it still needs time-consuming and complicated post-processing
- Reading text in the wild, as a fundamental task in the field of computer vision, has been widely studied
- Most previous works mainly focus on several challenging issues in natural scene text detection, such as multi-oriented text (Lyu et al 2018b), large aspect ratios (Liao et al 2018), and difficulty in separating adjacent text instances (Deng et al 2018)
- Due to the large differences in foreground text and background objects, as well as the variety of text changes in shape, color, font, orientation and scale, together with extreme illumination and occlusion, there are still many challenges to be addressed for text detection in natural scenes
- We propose a shape robust text detector guided by semantic information
- We have presented a shape robust text detector that can detect text with arbitrary shapes
- (3) We will investigate more efficient fast text detection networks that running on mobile phones
- Method Baseline
False positive problems often appear in complex natural scenes.
- The proposed method can be flexibly applied to different types of scene text detection datasets without special modifications.
- CTPN(Tian et al 2016) SegLink (Shi, Bai, and Belongie 2017) MCN(Liu et al 2018) SSTD(He et al 2017b) WordSup∗(Hu et al 2017) EAST∗(Zhou et al 2017) Lyu et al(Lyu et al 2018b) DeepReg(He et al 2017c) RRD∗(Liao et al 2018) TextSnake(Long et al 2018) PixelLink(Deng et al 2018) FTSN(Dai et al 2017) IncepText(Yang et al 2018) Baseline Ours Recall 51.6 76.8.
- Results on Scene Text Benchmarks
Detecting MultiLingual Text The authors first pretrain the proposed network on SynthText for one epoch fine-tuned on MLT 9000 train and val images for 40 epochs.
- Results on Scene Text Benchmarks.
- Detecting MultiLingual Text The authors first pretrain the proposed network on SynthText for one epoch fine-tuned on MLT 9000 train and val images for 40 epochs.
- With single scale of 848(short edge), the proposed method achieves an F-measure of 70.0%, outperforming state of the art methods over 3%.
- By merging the results of two scales, the Fmeasure is 74.1%, which outperforms all competing methods by at least 1.7%.
- The authors have presented a shape robust text detector that can detect text with arbitrary shapes.
- It is an end-to-end trainable framework with semantic segmentation guidance.
- The authors are interested in multiple directions as below: (1) The authors will attempt to integrate the Re-Score mechanism into the network in an end-to-end manner.
- The authors are interested in multiple directions as below: (1) The authors will attempt to integrate the Re-Score mechanism into the network in an end-to-end manner. (2) The authors are interested in exploring the method on other multi-oriented or curved object detection task, such as an aerial scene. (3) The authors will investigate more efficient fast text detection networks that running on mobile phones
- Table1: Effectiveness of several modules on ICDAR2017 MLT incidental scene text location task
- Table2: Effectiveness of several methods on ICDAR2017 MLT incidental scene text location task. ∗ means multi scale test
- Table3: Effectiveness of several methods on ICDAR2015. ∗ means multi scale test
- Table4: Effectiveness of several methods on ICDAR2013. ∗ means multi scale test
- Table5: Effectiveness of several methods on Total-Text dataset. Note that EAST and SegLink were not fine-tuned on Total-Text. Therefore their results are included only for reference
- Scene text detection, as one of the most important problems in computer vision, has been extensively studied. Most of the previous deep learning methods can be roughly divided into two branches: segmentation-based text detection and regression-based text detection.
Mainstream segmentation-based approaches are inspired by fully convolutional networks (FCN) (Long, Shelhamer, and Darrell 2015). (Zhang et al 2016) first uses FCN to extract text blocks and detect character candidates from those text blocks with MSER. (Yao et al 2016) treats one text region as consisting of three parts:text/non-text, character classes, and character linking orientations, then use them as labels for FCN. PixelLink (Deng et al 2018) performs text/non-text and link prediction on an input image, then adds some post-processing to get text box and filter noise. PSENET (Li et al 2018) finds text kernels and uses progressive scale expansion to position text boundary. (Peng et al 2017b) argues that using large kernel can help boosting semantic segmentation performance. The main difference between these methods is the generation of different labels for the text. Segmentation-based approaches often need time-consuming post-processing steps while obtained performance is still unsatisfying.
- This work was supported by the National Natural Science Foundation of China under Grant number 61771346
- Ch’ng, C. K., and Chan, C. S. 2017. Total-text: A comprehensive dataset for scene text detection and recognition. In ICDAR. IEEE.
- Dai, Y.; Huang, Z.; Gao, Y.; Xu, Y.; Chen, K.; Guo, J.; and Qiu, W. 2017. Fused text segmentation networks for multioriented scene text detection. ICPR.
- Deng, D.; Liu, H.; Li, X.; and Cai, D. 2018. Pixellink: Detecting scene text via instance segmentation. AAAI.
- Divvala, S. K.; Hoiem, D.; Hays, J. H.; Efros, A. A.; and Hebert, M. 2009. An empirical study of context in object detection. In CVPR.
- Gupta, A.; Vedaldi, A.; and Zisserman, A. 2016. Synthetic data for text localisation in natural images. In CVPR.
- He, K.; Gkioxari, G.; Dollar, P.; and Girshick, R. 2017a. Mask r-cnn. In ICCV.
- He, P.; Huang, W.; He, T.; Zhu, Q.; Qiao, Y.; and Li, X. 2017b. Single shot text detector with regional attention. In The IEEE International Conference on Computer Vision (ICCV).
- He, W.; Zhang, X.-Y.; Yin, F.; and Liu, C.-L. 2017c. Deep direct regression for multi-oriented scene text detection. ICCV.
- Hu, H.; Zhang, C.; Luo, Y.; Wang, Y.; Han, J.; and Ding, E. 2017. Wordsup: Exploiting word annotations for character based text detection. In ICCV.
- Karatzas, D.; Shafait, F.; Uchida, S.; Iwamura, M.; i Bigorda, L. G.; Mestre, S. R.; Mas, J.; Mota, D. F.; Almazan, J. A.; and De Las Heras, L. P. 2013. Icdar 2013 robust reading competition. In ICDAR.
- Karatzas, D.; Gomez-Bigorda, L.; Nicolaou, A.; Ghosh, S.; Bagdanov, A.; Iwamura, M.; Matas, J.; Neumann, L.; Chandrasekhar, V. R.; Lu, S.; et al. 2015. Icdar 2015 competition on robust reading. In ICDAR.
- Li, Y.; Qi, H.; Dai, J.; Ji, X.; and Wei, Y. 2016. Fully convolutional instance-aware semantic segmentation. arXiv preprint arXiv:1611.07709.
- Li, X.; Wang, W.; Hou, W.; Liu, R.-Z.; Lu, T.; and Yang, J. 2018. Shape robust text detection with progressive scale expansion network. arXiv preprint arXiv:1806.02559.
- Liao, M.; Shi, B.; Bai, X.; Wang, X.; and Liu, W. 2017. Textboxes: A fast text detector with a single deep neural network. In AAAI.
- Liao, M.; Zhu, Z.; Shi, B.; Xia, G.-s.; and Bai, X. 2018. Rotation-sensitive regression for oriented scene text detection. In CVPR.
- Liao, M.; Shi, B.; and Bai, X. 2018. Textboxes++: A singleshot oriented scene text detector. IEEE Transactions on Image Processing.
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; and Berg, A. C. 2016. Ssd: Single shot multibox detector. In ECCV.
- Liu, Z.; Lin, G.; Yang, S.; Feng, J.; Lin, W.; and Goh, W. L. 20Learning markov clustering networks for scene text detection. CVPR.
- Long, S.; Ruan, J.; Zhang, W.; He, X.; Wu, W.; and Yao, C. 2018. Textsnake: A flexible representation for detecting text of arbitrary shapes. In ECCV.
- Long, J.; Shelhamer, E.; and Darrell, T. 2015. Fully convolutional networks for semantic segmentation. In CVPR.
- Lyu, P.; Liao, M.; Yao, C.; Wu, W.; and Bai, X. 2018a. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In ECCV.
- Lyu, P.; Yao, C.; Wu, W.; Yan, S.; and Bai, X. 2018b. Multioriented scene text detection via corner localization and region segmentation. In CVPR.
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; and Xue, X. 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia.
- Nayef, N.; Yin, F.; Bizid, I.; Choi, H.; Feng, Y.; Karatzas, D.; Luo, Z.; Pal, U.; Rigaud, C.; Chazalon, J.; et al. 2017. Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In ICDAR. IEEE.
- Oliva, A., and Torralba, A. 2007. The role of context in object recognition. Trends in cognitive sciences.
- Peng, C.; Xiao, T.; Li, Z.; Jiang, Y.; Zhang, X.; Jia, K.; Yu, G.; and Sun, J. 2017a. Megdet: A large mini-batch object detector. CVPR.
- Peng, C.; Zhang, X.; Yu, G.; Luo, G.; and Sun, J. 2017b. Large kernel matters–improve semantic segmentation by global convolutional network. In CVPR.
- Ren, S.; He, K.; Girshick, R.; and Sun, J. 2015. Faster rcnn: Towards real-time object detection with region proposal networks. In NIPS.
- Shi, B.; Bai, X.; and Belongie, S. 2017. Detecting oriented text in natural images by linking segments. CVPR.
- Tian, Z.; Huang, W.; He, T.; He, P.; and Qiao, Y. 2016. Detecting text in natural image with connectionist text proposal network. In ECCV.
- Yang, Q.; Cheng, M.; Zhou, W.; Chen, Y.; Qiu, M.; and Lin, W. 2018. Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. IJCAI.
- Yao, C.; Bai, X.; Sang, N.; Zhou, X.; Zhou, S.; and Cao, Z. 2016. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002.
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; and Sang, N. 2018. Bisenet: Bilateral segmentation network for real-time semantic segmentation. ECCV.
- Zhang, Z.; Zhang, C.; Shen, W.; Yao, C.; Liu, W.; and Bai, X. 2016. Multi-oriented text detection with fully convolutional networks. In CVPR.
- Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; and Liang, J. 2017. East: an efficient and accurate scene text detector. In CVPR.