Progressive Attention Guided Recurrent Network for Salient Object Detection

CVPR, pp. 714-722, 2018.

Cited by: 163|Bibtex|Views78
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
We propose a novel progressive attention guided recurrent network, which selectively integrates contextual information from multi-level features to generate powerful attentive features

Abstract:

Effective convolutional features play an important role in saliency estimation but how to learn powerful features for saliency is still a challenging task. FCN-based methods directly apply multi-level convolutional features without distinction, which leads to sub-optimal results due to the distraction from redundant details. In this paper...More

Code:

Data:

0
Introduction
  • Salient object detection, which simulates the human vision system to judge the importance of image regions, has received increasing attention in recent years.
  • Conventional saliency methods usually utilize hand-crafted low-level features such as color, intensity, contrast to predict saliency.
  • It is of great difficulty for these low-level features based approaches to detect salient objects in complex scenarios.
  • Convolutional Neural Networks (CNNs), which intelligently extract high-level and multi-scale complex representations from raw images directly, have achieved superior performance in many vision tasks.
  • How to design a reasonable network which is able to learn effective features and how to process these features for saliency estimation become the key issues to be addressed
Highlights
  • Salient object detection, which simulates the human vision system to judge the importance of image regions, has received increasing attention in recent years
  • Due to the semantic information obtained from high-level features, Convolutional Neural Networks (CNNs) based saliency detection approaches have successfully broken the bottleneck of hand-crafted features
  • Based on channel-wise and spatial attention mechanisms, we propose a progressive attention driven framework, which selectively integrates multilevel contextual information
  • Motivated by the above mentioned methods, we find that effective features are of great importance to saliency detection
  • Stimulated by the success of attention in these vision tasks, we propose a progressive attention guided network which generates attentive features by channel-wise and spatial attention mechanisms sequentially
  • We propose a novel progressive attention guided recurrent network, which selectively integrates contextual information from multi-level features to generate powerful attentive features
Results
  • Datasets: To evaluate the performance of the algorithm, the authors conduct experiments on six benchmark datasets: ECSSD [32], HKU-IS [13], THUR15K [3], PASCAL-S [17], DUT-OMRON [33] and DUTS [26].
  • All input images are resized to 353×353.
  • Evaluation Metrics: The authors adopt precision-recall (PR) curves, F-measure, mean absolute error (MAE) and recently proposed S-measure [6] as the evaluation metrics.
  • The F-measure, which is an overall performance measurement, is defined as: Fβ (1
Conclusion
  • The authors propose a novel progressive attention guided recurrent network, which selectively integrates contextual information from multi-level features to generate powerful attentive features.
  • By introducing multi-path recurrent connections, global semantic information is utilized to guide the feature learning procedure of shallower layers, which refines the entire network essentially.
  • Extensive evaluations demonstrate the effectiveness of the network
Summary
  • Introduction:

    Salient object detection, which simulates the human vision system to judge the importance of image regions, has received increasing attention in recent years.
  • Conventional saliency methods usually utilize hand-crafted low-level features such as color, intensity, contrast to predict saliency.
  • It is of great difficulty for these low-level features based approaches to detect salient objects in complex scenarios.
  • Convolutional Neural Networks (CNNs), which intelligently extract high-level and multi-scale complex representations from raw images directly, have achieved superior performance in many vision tasks.
  • How to design a reasonable network which is able to learn effective features and how to process these features for saliency estimation become the key issues to be addressed
  • Results:

    Datasets: To evaluate the performance of the algorithm, the authors conduct experiments on six benchmark datasets: ECSSD [32], HKU-IS [13], THUR15K [3], PASCAL-S [17], DUT-OMRON [33] and DUTS [26].
  • All input images are resized to 353×353.
  • Evaluation Metrics: The authors adopt precision-recall (PR) curves, F-measure, mean absolute error (MAE) and recently proposed S-measure [6] as the evaluation metrics.
  • The F-measure, which is an overall performance measurement, is defined as: Fβ (1
  • Conclusion:

    The authors propose a novel progressive attention guided recurrent network, which selectively integrates contextual information from multi-level features to generate powerful attentive features.
  • By introducing multi-path recurrent connections, global semantic information is utilized to guide the feature learning procedure of shallower layers, which refines the entire network essentially.
  • Extensive evaluations demonstrate the effectiveness of the network
Tables
  • Table1: MAE (lower is better) and F-measure (higher is better) comparisons with 13 methods on 6 benchmark datasets. The best three results are shown in red, green, and blue fonts respectively. Our algorithm ranks first on almost all datasets
  • Table2: Ablation analysis using F-measure, S-measure and MAE metrics. The results of the top two are shown in red and green
Download tables as Excel
Related work
  • In this section, we briefly introduce the related works in three aspects. At the beginning, several representative salient object detection methods are reviewed. Then we describe the application of attention mechanisms in various vision tasks. Finally, we compare our multi-path recurrent network with other recurrent based works.

    2.1. Salient Object Detection

    Salient Object Detection methods can be categorized as conventional low-level hand-crafted features based [20, 4, 33, 15, 9, 38, 23, 21] and Convolutional Neural Networks driven [25, 13, 37, 14, 12, 16, 27, 18, 7, 35, 36, 28] approaches. Most of traditional saliency methods are based on low-level manually designed features, such as color, region contrast, etc. Detailed introductions of these methods can be found in recent survey paper [1]. In this paper, we put more emphasis on CNNs based approaches.
Funding
  • This work was supported by the Natural Science Foundation of China under Grant 61725202 and 61472060
Study subjects and analysis
benchmark datasets: 6
Through multi-path recurrent connections, global semantic information from the top convolutional layer is transferred to shallower layers, which intrinsically refines the entire network. Experimental results on six benchmark datasets demonstrate that our algorithm performs favorably against the state-of-the-art approaches.

benchmark datasets: 6
Through multi-path recurrent connections, global semantic information from the top convolutional layer is transferred to shallower layers, which intrinsically refines the entire network. Experimental results on six benchmark datasets demonstrate that our algorithm performs favorably against the state-of-the-art approaches. Salient object detection, which simulates the human vision system to judge the importance of image regions, has received increasing attention in recent years

benchmark datasets: 6
Experimental Setup. Datasets: To evaluate the performance of our algorithm, we conduct experiments on six benchmark datasets: ECSSD [32], HKU-IS [13], THUR15K [3], PASCAL-S [17], DUT-OMRON [33] and DUTS (the testing dataset which contains 5019 images) [26]. Implementation Details: The proposed algorithm is based on Caffe [8]

benchmark datasets: 6
. MAE (lower is better) and F-measure (higher is better) comparisons with 13 methods on 6 benchmark datasets. The best three results are shown in red, green, and blue fonts respectively. Our algorithm ranks first on almost all datasets. Ablation analysis using F-measure, S-measure and MAE metrics. The results of the top two are shown in red and green

datasets: 4
Illustration of multi-path recurrent connections. PAG. The first row shows the P-R curve of the proposed method with other state-of-the-art methods. The second shows F-measure curves. The last show the average precision, recall, and F-measure scores across four datasets. The proposed method performs best among all datasets in terms of all metrics. Visual comparison between our results and state-of-the-art methods

Reference
  • A. Borji, M. M. Cheng, H. Jiang, and J. Li. Salient object detection: A benchmark. TIP, 24(12):5706–5722, 2015. 2
    Google ScholarLocate open access versionFindings
  • L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu, and T.-S. Chua. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In CVPR, June 2017. 2, 3
    Google ScholarLocate open access versionFindings
  • M.-M. Cheng, N. J. Mitra, X. Huang, and S.-M. Hu. Salientshape: group saliency in image collections. The Visual Computer, 30:443–453, Apr 2014. 6
    Google ScholarLocate open access versionFindings
  • M. M. Cheng, N. J. Mitra, X. Huang, P. H. S. Torr, and S. M. Hu. Global contrast based salient region detection. TPAMI, 37(3):569–582, 2015. 2
    Google ScholarLocate open access versionFindings
  • X. Chu, W. Yang, W. Ouyang, C. Ma, A. L. Yuille, and X. Wang. Multi-context attention for human pose estimation. In CVPR, June 2017. 2, 3
    Google ScholarLocate open access versionFindings
  • D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji. Structure-measure: A new way to evaluate foreground maps. In ICCV, 2017. 6
    Google ScholarLocate open access versionFindings
  • Q. Hou, M. M. Cheng, X. Hu, A. Borji, Z. Tu, and P. Torr. Deeply supervised salient object detection with short connections. In CVPR, June 2012
    Google ScholarLocate open access versionFindings
  • Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM, pages 675– 678, 2014. 6
    Google ScholarLocate open access versionFindings
  • H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li. Salient object detection: A discriminative regional feature integration approach. In CVPR, June 2013. 2, 6
    Google ScholarLocate open access versionFindings
  • X. Jin, Y. Chen, Z. Jie, J. Feng, and S. Yan. Multi-path feedback recurrent neural networks for scene parsing. In AAAI, pages 4096–4102, 2017. 5
    Google ScholarLocate open access versionFindings
  • J. Kuen, Z. Wang, and G. Wang. Recurrent attentional networks for saliency detection. In CVPR, June 2016. 3
    Google ScholarLocate open access versionFindings
  • G. Lee, Y.-W. Tai, and J. Kim. Deep saliency with encoded low level distance map and high level features. In CVPR, June 2016. 2, 6
    Google ScholarLocate open access versionFindings
  • G. Li and Y. Yu. Visual saliency based on multiscale deep features. In CVPR, June 2015. 2, 6
    Google ScholarLocate open access versionFindings
  • G. Li and Y. Yu. Deep contrast learning for salient object detection. In CVPR, June 2016. 2, 6
    Google ScholarLocate open access versionFindings
  • X. Li, H. Lu, L. Zhang, X. Ruan, and M.-H. Yang. Saliency detection via dense and sparse reconstruction. In ICCV, December 2013. 2
    Google ScholarFindings
  • X. Li, L. Zhao, L. Wei, M. H. Yang, F. Wu, Y. Zhuang, H. Ling, and J. Wang. Deepsaliency: Multi-task deep neural network model for salient object detection. TIP, 25(8):3919– 3930, 202, 6
    Google ScholarLocate open access versionFindings
  • Y. Li, X. Hou, C. Koch, J. M. Rehg, and A. L. Yuille. The secrets of salient object segmentation. In CVPR, June 2014. 6
    Google ScholarLocate open access versionFindings
  • N. Liu and J. Han. Dhsnet: Deep hierarchical saliency network for salient object detection. In CVPR, June 2016. 2, 6
    Google ScholarLocate open access versionFindings
  • J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, June 2015. 2
    Google ScholarLocate open access versionFindings
  • F. Perazzi, P. Krahenbuhl, Y. Pritch, and A. Hornung. Saliency filters: Contrast based filtering for salient region detection. In CVPR, pages 733–740, 2012. 2
    Google ScholarLocate open access versionFindings
  • Y. Qin, H. Lu, Y. Xu, and H. Wang. Saliency detection via cellular automata. In CVPR, June 2015. 2
    Google ScholarLocate open access versionFindings
  • M. Simon, E. Rodner, and J. Denzler. Imagenet pre-trained models with batch normalization. 2016. 6
    Google ScholarFindings
  • N. Tong, H. Lu, X. Ruan, and M.-H. Yang. Salient object detection via bootstrap learning. In CVPR, June 2015. 2, 6
    Google ScholarLocate open access versionFindings
  • F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang. Residual attention network for image classification. In CVPR, June 2017. 2, 3
    Google ScholarLocate open access versionFindings
  • L. Wang, H. Lu, X. Ruan, and M.-H. Yang. Deep networks for saliency detection via local estimation and global search. In CVPR, June 2015. 2, 6
    Google ScholarLocate open access versionFindings
  • L. Wang, H. Lu, Y. Wang, M. Feng, D. Wang, B. Yin, and X. Ruan. Learning to detect salient objects with image-level supervision. In CVPR, June 2017. 6
    Google ScholarLocate open access versionFindings
  • L. Wang, L. Wang, H. Lu, P. Zhang, and X. Ruan. Saliency detection with recurrent fully convolutional networks. In ECCV, pages 825–841, 2016. 2, 3, 6
    Google ScholarLocate open access versionFindings
  • T. Wang, A. Borji, L. Zhang, P. Zhang, and H. Lu. A stagewise refinement model for detecting salient objects in images. In ICCV, 2017. 2
    Google ScholarLocate open access versionFindings
  • T. Wang, L. Zhang, H. Lu, C. Sun, and J. Qi. Kernelized subspace ranking for saliency detection. In ECCV, pages 450–466, 2016. 6
    Google ScholarLocate open access versionFindings
  • H. Xu and K. Saenko. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In ECCV, pages 451–466, 2016. 2
    Google ScholarLocate open access versionFindings
  • K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015. 2
    Google ScholarLocate open access versionFindings
  • Q. Yan, L. Xu, J. Shi, and J. Jia. Hierarchical saliency detection. In CVPR, June 2013. 6
    Google ScholarFindings
  • C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang. Saliency detection via graph-based manifold ranking. In CVPR, June 2013. 2, 6
    Google ScholarFindings
  • Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. In CVPR, June 2016. 2
    Google ScholarLocate open access versionFindings
  • P. Zhang, D. Wang, H. Lu, H. Wang, and X. Ruan. Amulet: Aggregating multi-level convolutional features for salient object detection. In ICCV, 2017. 2, 6
    Google ScholarLocate open access versionFindings
  • P. Zhang, D. Wang, H. Lu, H. Wang, and B. Yin. Learning uncertain convolutional features for accurate saliency detection. In ICCV, 2017. 2, 6
    Google ScholarLocate open access versionFindings
  • R. Zhao, W. Ouyang, H. Li, and X. Wang. Saliency detection by multi-context deep learning. In CVPR, June 2015. 2, 6
    Google ScholarLocate open access versionFindings
  • W. Zhu, S. Liang, Y. Wei, and J. Sun. Saliency optimization from robust background detection. In CVPR, June 2014. 2
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments