AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Linguistic representation is utilized in scene text detection to deal with the problem of text detection ambiguity

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

european conference on computer vision, pp.457-473, (2020)

Cited by: 1|Views101
Full Text
Bibtex
Weibo

Abstract

Scene text spotting aims to detect and recognize the entire word or sentence with multiple characters in natural images. It is still challenging because ambiguity often occurs when the spacing between characters is large or the characters are evenly spread in multiple rows and columns, making many visually plausible groupings of the cha...More

Code:

Data:

0
Introduction
  • Text analysis in unconstrained scene images like text detection and text recognition is important in many applications, such as document recognition, license plate recognition, and visual question answering based on texts.
  • This work addresses one of the important challenges, which is reducing the ambiguous bounding box proposals in scene text detection
  • These ambiguous proposals widely occur when the spacing of the characters of a word is large or multiple text lines are juxtaposed in different rows or columns in an image.
  • As shown in Fig. 1(c), these vision-based text detectors are insufficient to detect text lines correctly in ambiguous samples
Highlights
  • Text analysis in unconstrained scene images like text detection and text recognition is important in many applications, such as document recognition, license plate recognition, and visual question answering based on texts
  • In the re-scoring step, we propose a language module (LM) that can learn linguistic representation to re-score the candidate text lines and to eliminate ambiguity, making the text lines that correspond to natural language have higher scores than those not
  • Linguistic representation is utilized in scene text detection to deal with the problem of text detection ambiguity
  • LM can effectively lower the scores of incorrect text lines while improve the scores of correct proposals
  • Extensive experiments demonstrate the advantages of our method, especially in scenarios of text detection ambiguity
Methods
  • Fig. 3 shows the overall architecture of AE TextSpotter, which consists of two vision-based modules and one language-based module, namely, the text detection module (TDM), the character-based recognition module (CRM), and the language module (LM)
  • Among these modules, TDM and CRM aim to detect the bounding boxes and recognize the content of candidate text lines; and LM is applied to lower the scores of incorrect text lines by utilizing linguistic features, which is the key module to remove ambiguous samples.
Results
  • The authors carefully select a set of extremely ambiguous samples from the IC19-ReCTS dataset, where the approach surpasses other methods by more than 4%.
  • ReCTS, the model with LM obtains the F-measure of 81.39 and the 1-NED of 51.32%, significantly surpassing the model without LM by 3.46% and 3.57%.
  • As shown in Table 5, the AE TextSpotter achieves the F-measure of 91.80% and the 1-NED of 71.81%, surpassing other methods
Conclusion
  • As demonstrated in previous experiments, the proposed AE TextSpotter works well in most cases, including scenarios of text detection ambiguity.
  • The authors proposed a novel text spotter, termed AE TextSpotter, which introduces linguistic representation to eliminate ambiguity in text detection.
  • Linguistic representation is utilized in scene text detection to deal with the problem of text detection ambiguity.
  • Extensive experiments demonstrate the advantages of the method, especially in scenarios of text detection ambiguity
Tables
  • Table1: The proportion of text lines with the problem of text detection ambiguity
  • Table2: The recall of TDM and the number
  • Table3: The time cost per image of candidate text lines per image under different and 1-NED of different recognizers post-processing thresholds
  • Table4: The single-scale results on TDA-ReCTS. “P”, “R”, “F” and “1-NED” mean the precision, recall, F-measure, and normalized edit distance [<a class="ref-link" id="c32" href="#r32">32</a>], respectively
  • Table5: The single-scale results on the IC19-ReCTS test set. “P”, “R”, “F” and “1-NED” represent the precision, recall, F-measure, and normalized edit distance, respectively. “*” denotes the methods in competition [<a class="ref-link" id="c32" href="#r32">32</a>], which use extra datasets, multi-scale testing, and model ensemble. “800×” means that the short side of input images is scaled to 800
  • Table6: The time cost of all modules in AE TextSpotter
Download tables as Excel
Related work
  • Scene text detection has been a research hotspot in computer vision for a long period. Methods based on deep learning have become the mainstream of scene text detection. Tian et al [27] and Liao et al [15] successfully adopted the framework of object detection into text detection and achieved good performance on horizontal text detection. After that, many works [33,24,4,14,17] took the orientation of text lines into consideration and make it possible to detect arbitrary-oriented text lines. Recently, curved text detection attracted increasing attention, and segmentation-based methods [20,12,30,31] achieved excellent performances over the curved text benchmarks. These methods improve the performance of text detection to a high level, but none of them can deal with the ambiguity problem in text detection. In this work, we introduce linguistic features in the text detection module to solve the text detection ambiguity problem.
Funding
  • This work is supported by the Natural Science Foundation of China under Grant 61672273 and Grant 61832008, the Science Foundation for Distinguished Young Scholars of Jiangsu under Grant BK20160021, and Scientific Foundation of State Grid Corporation of China (Research on Ice-wind Disaster Feature Recognition and Prediction by Few-shot Machine Learning in Transmission Lines)
  • Chunhua Shen and his employer received no financial support for the research, authorship and publication of this paper
Reference
  • Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: Towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 5076–5084 (2017)
    Google ScholarLocate open access versionFindings
  • Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: Aon: Towards arbitrarilyoriented text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5571–5579 (2018)
    Google ScholarLocate open access versionFindings
  • Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
    Findings
  • Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: Detecting scene text via instance segmentation. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
    Google ScholarLocate open access versionFindings
  • Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A largescale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255. Ieee (2009)
    Google ScholarFindings
  • Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
    Findings
  • Feng, W., He, W., Yin, F., Zhang, X.Y., Liu, C.L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 9076–9085 (2019)
    Google ScholarLocate open access versionFindings
  • Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning. pp. 369– 376. ACM (2006)
    Google ScholarLocate open access versionFindings
  • He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp. 2961–2969 (2017)
    Google ScholarLocate open access versionFindings
  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
    Google ScholarLocate open access versionFindings
  • Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. arXiv preprint arXiv:1707.03985 (2017)
    Findings
  • Li, X., Wang, W., Hou, W., Liu, R.Z., Lu, T., Yang, J.: Shape robust text detection with progressive scale expansion network. arXiv preprint arXiv:1806.02559 (2018)
    Findings
  • Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: An endto-end trainable neural network for spotting text with arbitrary shapes. IEEE transactions on pattern analysis and machine intelligence (2019)
    Google ScholarLocate open access versionFindings
  • Liao, M., Shi, B., Bai, X.: Textboxes++: A single-shot oriented scene text detector. IEEE transactions on image processing 27(8), 3676–3690 (2018)
    Google ScholarLocate open access versionFindings
  • Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: A fast text detector with a single deep neural network. In: AAAI. pp. 4161–4167 (2017)
    Google ScholarFindings
  • Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2117–2125 (2017)
    Google ScholarLocate open access versionFindings
  • Liu, J., Liu, X., Sheng, J., Liang, D., Li, X., Liu, Q.: Pyramid mask text detector. arXiv preprint arXiv:1903.11800 (2019)
    Findings
  • Liu, W., Chen, C., Wong, K.Y.K., Su, Z., Han, J.: Star-net: A spatial attention residue network for scene text recognition. In: BMVC. vol. 2, p. 7 (2016)
    Google ScholarFindings
  • Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5676–5685 (2018)
    Google ScholarLocate open access versionFindings
  • Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 20–36 (2018)
    Google ScholarLocate open access versionFindings
  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, highperformance deep learning library. In: Advances in Neural Information Processing Systems. pp. 8024–8035 (2019)
    Google ScholarLocate open access versionFindings
  • Qin, S., Bissacco, A., Raptis, M., Fujii, Y., Xiao, Y.: Towards unconstrained endto-end text spotting. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 4704–4714 (2019)
    Google ScholarLocate open access versionFindings
  • Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems. pp. 91–99 (2015)
    Google ScholarFindings
  • Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. arXiv preprint arXiv:1703.06520 (2017)
    Findings
  • Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence 39(11), 2298–2304 (2016)
    Google ScholarLocate open access versionFindings
  • Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: An attentional scene text recognizer with flexible rectification. IEEE transactions on pattern analysis and machine intelligence (2018)
    Google ScholarLocate open access versionFindings
  • Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: European Conference on Computer Vision. pp. 56–72.
    Google ScholarLocate open access versionFindings
  • Wang, J., Hu, X.: Gated recurrent convolution neural network for ocr. In: Advances in Neural Information Processing Systems. pp. 335–344 (2017)
    Google ScholarLocate open access versionFindings
  • Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9336–9345 (2019)
    Google ScholarLocate open access versionFindings
  • Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
    Google ScholarLocate open access versionFindings
  • Xie, E., Zang, Y., Shao, S., Yu, G., Yao, C., Li, G.: Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp. 9038–9045 (2019)
    Google ScholarLocate open access versionFindings
  • Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M., et al.: Icdar 2019 robust reading challenge on reading chinese text on signboard. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 1577–1581. IEEE (2019)
    Google ScholarLocate open access versionFindings
  • Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: An efficient and accurate scene text detector. arXiv preprint arXiv:1704.03155 (2017)
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科