AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We have presented a shape robust text detector that can detect text with arbitrary shapes

Scene Text Detection with Supervised Pyramid Context Network.

national conference on artificial intelligence, (2019)

Cited by: 118|Views155
EI
Full Text
Bibtex
Weibo

Abstract

Scene text detection methods based on deep learning have achieved remarkable results over the past years. However, due to the high diversity and complexity of natural scenes, previous state-of-the-art text detection methods may still produce a considerable amount of false positives, when applied to images captured in real-world environmen...More

Code:

Data:

0
Introduction
  • Reading text in the wild, as a fundamental task in the field of computer vision, has been widely studied.
  • Most previous works mainly focus on several challenging issues in natural scene text detection, such as multi-oriented text (Lyu et al 2018b), large aspect ratios (Liao et al 2018), and difficulty in separating adjacent text instances (Deng et al 2018).
  • The first challenge is false positives (FP)
  • Some specific scenarios such as autonomous driving require high precision in text detection.
  • TextSnake (Long et al 2018) uses ordered disks to represent curved text, but it still needs time-consuming and complicated post-processing
Highlights
  • Reading text in the wild, as a fundamental task in the field of computer vision, has been widely studied
  • Most previous works mainly focus on several challenging issues in natural scene text detection, such as multi-oriented text (Lyu et al 2018b), large aspect ratios (Liao et al 2018), and difficulty in separating adjacent text instances (Deng et al 2018)
  • Due to the large differences in foreground text and background objects, as well as the variety of text changes in shape, color, font, orientation and scale, together with extreme illumination and occlusion, there are still many challenges to be addressed for text detection in natural scenes
  • We propose a shape robust text detector guided by semantic information
  • We have presented a shape robust text detector that can detect text with arbitrary shapes
  • (3) We will investigate more efficient fast text detection networks that running on mobile phones
Methods
Results
  • Results on Scene Text Benchmarks

    Detecting MultiLingual Text The authors first pretrain the proposed network on SynthText for one epoch fine-tuned on MLT 9000 train and val images for 40 epochs.
  • Results on Scene Text Benchmarks.
  • Detecting MultiLingual Text The authors first pretrain the proposed network on SynthText for one epoch fine-tuned on MLT 9000 train and val images for 40 epochs.
  • With single scale of 848(short edge), the proposed method achieves an F-measure of 70.0%, outperforming state of the art methods over 3%.
  • By merging the results of two scales, the Fmeasure is 74.1%, which outperforms all competing methods by at least 1.7%.
Conclusion
  • The authors have presented a shape robust text detector that can detect text with arbitrary shapes.
  • It is an end-to-end trainable framework with semantic segmentation guidance.
  • The authors are interested in multiple directions as below: (1) The authors will attempt to integrate the Re-Score mechanism into the network in an end-to-end manner.
  • The authors are interested in multiple directions as below: (1) The authors will attempt to integrate the Re-Score mechanism into the network in an end-to-end manner. (2) The authors are interested in exploring the method on other multi-oriented or curved object detection task, such as an aerial scene. (3) The authors will investigate more efficient fast text detection networks that running on mobile phones
Tables
  • Table1: Effectiveness of several modules on ICDAR2017 MLT incidental scene text location task
  • Table2: Effectiveness of several methods on ICDAR2017 MLT incidental scene text location task. ∗ means multi scale test
  • Table3: Effectiveness of several methods on ICDAR2015. ∗ means multi scale test
  • Table4: Effectiveness of several methods on ICDAR2013. ∗ means multi scale test
  • Table5: Effectiveness of several methods on Total-Text dataset. Note that EAST and SegLink were not fine-tuned on Total-Text. Therefore their results are included only for reference
Download tables as Excel
Related work
  • Scene text detection, as one of the most important problems in computer vision, has been extensively studied. Most of the previous deep learning methods can be roughly divided into two branches: segmentation-based text detection and regression-based text detection.

    Mainstream segmentation-based approaches are inspired by fully convolutional networks (FCN) (Long, Shelhamer, and Darrell 2015). (Zhang et al 2016) first uses FCN to extract text blocks and detect character candidates from those text blocks with MSER. (Yao et al 2016) treats one text region as consisting of three parts:text/non-text, character classes, and character linking orientations, then use them as labels for FCN. PixelLink (Deng et al 2018) performs text/non-text and link prediction on an input image, then adds some post-processing to get text box and filter noise. PSENET (Li et al 2018) finds text kernels and uses progressive scale expansion to position text boundary. (Peng et al 2017b) argues that using large kernel can help boosting semantic segmentation performance. The main difference between these methods is the generation of different labels for the text. Segmentation-based approaches often need time-consuming post-processing steps while obtained performance is still unsatisfying.
Funding
  • This work was supported by the National Natural Science Foundation of China under Grant number 61771346
Reference
  • Ch’ng, C. K., and Chan, C. S. 2017. Total-text: A comprehensive dataset for scene text detection and recognition. In ICDAR. IEEE.
    Google ScholarFindings
  • Dai, Y.; Huang, Z.; Gao, Y.; Xu, Y.; Chen, K.; Guo, J.; and Qiu, W. 2017. Fused text segmentation networks for multioriented scene text detection. ICPR.
    Google ScholarFindings
  • Deng, D.; Liu, H.; Li, X.; and Cai, D. 2018. Pixellink: Detecting scene text via instance segmentation. AAAI.
    Google ScholarLocate open access versionFindings
  • Divvala, S. K.; Hoiem, D.; Hays, J. H.; Efros, A. A.; and Hebert, M. 2009. An empirical study of context in object detection. In CVPR.
    Google ScholarFindings
  • Gupta, A.; Vedaldi, A.; and Zisserman, A. 2016. Synthetic data for text localisation in natural images. In CVPR.
    Google ScholarFindings
  • He, K.; Gkioxari, G.; Dollar, P.; and Girshick, R. 2017a. Mask r-cnn. In ICCV.
    Google ScholarFindings
  • He, P.; Huang, W.; He, T.; Zhu, Q.; Qiao, Y.; and Li, X. 2017b. Single shot text detector with regional attention. In The IEEE International Conference on Computer Vision (ICCV).
    Google ScholarLocate open access versionFindings
  • He, W.; Zhang, X.-Y.; Yin, F.; and Liu, C.-L. 2017c. Deep direct regression for multi-oriented scene text detection. ICCV.
    Google ScholarLocate open access versionFindings
  • Hu, H.; Zhang, C.; Luo, Y.; Wang, Y.; Han, J.; and Ding, E. 2017. Wordsup: Exploiting word annotations for character based text detection. In ICCV.
    Google ScholarFindings
  • Karatzas, D.; Shafait, F.; Uchida, S.; Iwamura, M.; i Bigorda, L. G.; Mestre, S. R.; Mas, J.; Mota, D. F.; Almazan, J. A.; and De Las Heras, L. P. 2013. Icdar 2013 robust reading competition. In ICDAR.
    Google ScholarFindings
  • Karatzas, D.; Gomez-Bigorda, L.; Nicolaou, A.; Ghosh, S.; Bagdanov, A.; Iwamura, M.; Matas, J.; Neumann, L.; Chandrasekhar, V. R.; Lu, S.; et al. 2015. Icdar 2015 competition on robust reading. In ICDAR.
    Google ScholarFindings
  • Li, Y.; Qi, H.; Dai, J.; Ji, X.; and Wei, Y. 2016. Fully convolutional instance-aware semantic segmentation. arXiv preprint arXiv:1611.07709.
    Findings
  • Li, X.; Wang, W.; Hou, W.; Liu, R.-Z.; Lu, T.; and Yang, J. 2018. Shape robust text detection with progressive scale expansion network. arXiv preprint arXiv:1806.02559.
    Findings
  • Liao, M.; Shi, B.; Bai, X.; Wang, X.; and Liu, W. 2017. Textboxes: A fast text detector with a single deep neural network. In AAAI.
    Google ScholarFindings
  • Liao, M.; Zhu, Z.; Shi, B.; Xia, G.-s.; and Bai, X. 2018. Rotation-sensitive regression for oriented scene text detection. In CVPR.
    Google ScholarFindings
  • Liao, M.; Shi, B.; and Bai, X. 2018. Textboxes++: A singleshot oriented scene text detector. IEEE Transactions on Image Processing.
    Google ScholarLocate open access versionFindings
  • Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; and Berg, A. C. 2016. Ssd: Single shot multibox detector. In ECCV.
    Google ScholarLocate open access versionFindings
  • Liu, Z.; Lin, G.; Yang, S.; Feng, J.; Lin, W.; and Goh, W. L. 20Learning markov clustering networks for scene text detection. CVPR.
    Google ScholarFindings
  • Long, S.; Ruan, J.; Zhang, W.; He, X.; Wu, W.; and Yao, C. 2018. Textsnake: A flexible representation for detecting text of arbitrary shapes. In ECCV.
    Google ScholarFindings
  • Long, J.; Shelhamer, E.; and Darrell, T. 2015. Fully convolutional networks for semantic segmentation. In CVPR.
    Google ScholarFindings
  • Lyu, P.; Liao, M.; Yao, C.; Wu, W.; and Bai, X. 2018a. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In ECCV.
    Google ScholarFindings
  • Lyu, P.; Yao, C.; Wu, W.; Yan, S.; and Bai, X. 2018b. Multioriented scene text detection via corner localization and region segmentation. In CVPR.
    Google ScholarFindings
  • Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; and Xue, X. 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia.
    Google ScholarLocate open access versionFindings
  • Nayef, N.; Yin, F.; Bizid, I.; Choi, H.; Feng, Y.; Karatzas, D.; Luo, Z.; Pal, U.; Rigaud, C.; Chazalon, J.; et al. 2017. Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In ICDAR. IEEE.
    Google ScholarFindings
  • Oliva, A., and Torralba, A. 2007. The role of context in object recognition. Trends in cognitive sciences.
    Google ScholarFindings
  • Peng, C.; Xiao, T.; Li, Z.; Jiang, Y.; Zhang, X.; Jia, K.; Yu, G.; and Sun, J. 2017a. Megdet: A large mini-batch object detector. CVPR.
    Google ScholarLocate open access versionFindings
  • Peng, C.; Zhang, X.; Yu, G.; Luo, G.; and Sun, J. 2017b. Large kernel matters–improve semantic segmentation by global convolutional network. In CVPR.
    Google ScholarFindings
  • Ren, S.; He, K.; Girshick, R.; and Sun, J. 2015. Faster rcnn: Towards real-time object detection with region proposal networks. In NIPS.
    Google ScholarFindings
  • Shi, B.; Bai, X.; and Belongie, S. 2017. Detecting oriented text in natural images by linking segments. CVPR.
    Google ScholarLocate open access versionFindings
  • Tian, Z.; Huang, W.; He, T.; He, P.; and Qiao, Y. 2016. Detecting text in natural image with connectionist text proposal network. In ECCV.
    Google ScholarFindings
  • Yang, Q.; Cheng, M.; Zhou, W.; Chen, Y.; Qiu, M.; and Lin, W. 2018. Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. IJCAI.
    Google ScholarFindings
  • Yao, C.; Bai, X.; Sang, N.; Zhou, X.; Zhou, S.; and Cao, Z. 2016. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002.
    Findings
  • Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; and Sang, N. 2018. Bisenet: Bilateral segmentation network for real-time semantic segmentation. ECCV.
    Google ScholarLocate open access versionFindings
  • Zhang, Z.; Zhang, C.; Shen, W.; Yao, C.; Liu, W.; and Bai, X. 2016. Multi-oriented text detection with fully convolutional networks. In CVPR.
    Google ScholarFindings
  • Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; and Liang, J. 2017. East: an efficient and accurate scene text detector. In CVPR.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科