Progressive Attention Guided Recurrent Network for Salient Object Detection
CVPR, pp. 714-722, 2018.
EI
Weibo:
Abstract:
Effective convolutional features play an important role in saliency estimation but how to learn powerful features for saliency is still a challenging task. FCN-based methods directly apply multi-level convolutional features without distinction, which leads to sub-optimal results due to the distraction from redundant details. In this paper...More
Code:
Data:
Introduction
- Salient object detection, which simulates the human vision system to judge the importance of image regions, has received increasing attention in recent years.
- Conventional saliency methods usually utilize hand-crafted low-level features such as color, intensity, contrast to predict saliency.
- It is of great difficulty for these low-level features based approaches to detect salient objects in complex scenarios.
- Convolutional Neural Networks (CNNs), which intelligently extract high-level and multi-scale complex representations from raw images directly, have achieved superior performance in many vision tasks.
- How to design a reasonable network which is able to learn effective features and how to process these features for saliency estimation become the key issues to be addressed
Highlights
- Salient object detection, which simulates the human vision system to judge the importance of image regions, has received increasing attention in recent years
- Due to the semantic information obtained from high-level features, Convolutional Neural Networks (CNNs) based saliency detection approaches have successfully broken the bottleneck of hand-crafted features
- Based on channel-wise and spatial attention mechanisms, we propose a progressive attention driven framework, which selectively integrates multilevel contextual information
- Motivated by the above mentioned methods, we find that effective features are of great importance to saliency detection
- Stimulated by the success of attention in these vision tasks, we propose a progressive attention guided network which generates attentive features by channel-wise and spatial attention mechanisms sequentially
- We propose a novel progressive attention guided recurrent network, which selectively integrates contextual information from multi-level features to generate powerful attentive features
Results
- Datasets: To evaluate the performance of the algorithm, the authors conduct experiments on six benchmark datasets: ECSSD [32], HKU-IS [13], THUR15K [3], PASCAL-S [17], DUT-OMRON [33] and DUTS [26].
- All input images are resized to 353×353.
- Evaluation Metrics: The authors adopt precision-recall (PR) curves, F-measure, mean absolute error (MAE) and recently proposed S-measure [6] as the evaluation metrics.
- The F-measure, which is an overall performance measurement, is defined as: Fβ (1
Conclusion
- The authors propose a novel progressive attention guided recurrent network, which selectively integrates contextual information from multi-level features to generate powerful attentive features.
- By introducing multi-path recurrent connections, global semantic information is utilized to guide the feature learning procedure of shallower layers, which refines the entire network essentially.
- Extensive evaluations demonstrate the effectiveness of the network
Summary
Introduction:
Salient object detection, which simulates the human vision system to judge the importance of image regions, has received increasing attention in recent years.- Conventional saliency methods usually utilize hand-crafted low-level features such as color, intensity, contrast to predict saliency.
- It is of great difficulty for these low-level features based approaches to detect salient objects in complex scenarios.
- Convolutional Neural Networks (CNNs), which intelligently extract high-level and multi-scale complex representations from raw images directly, have achieved superior performance in many vision tasks.
- How to design a reasonable network which is able to learn effective features and how to process these features for saliency estimation become the key issues to be addressed
Results:
Datasets: To evaluate the performance of the algorithm, the authors conduct experiments on six benchmark datasets: ECSSD [32], HKU-IS [13], THUR15K [3], PASCAL-S [17], DUT-OMRON [33] and DUTS [26].- All input images are resized to 353×353.
- Evaluation Metrics: The authors adopt precision-recall (PR) curves, F-measure, mean absolute error (MAE) and recently proposed S-measure [6] as the evaluation metrics.
- The F-measure, which is an overall performance measurement, is defined as: Fβ (1
Conclusion:
The authors propose a novel progressive attention guided recurrent network, which selectively integrates contextual information from multi-level features to generate powerful attentive features.- By introducing multi-path recurrent connections, global semantic information is utilized to guide the feature learning procedure of shallower layers, which refines the entire network essentially.
- Extensive evaluations demonstrate the effectiveness of the network
Tables
- Table1: MAE (lower is better) and F-measure (higher is better) comparisons with 13 methods on 6 benchmark datasets. The best three results are shown in red, green, and blue fonts respectively. Our algorithm ranks first on almost all datasets
- Table2: Ablation analysis using F-measure, S-measure and MAE metrics. The results of the top two are shown in red and green
Related work
- In this section, we briefly introduce the related works in three aspects. At the beginning, several representative salient object detection methods are reviewed. Then we describe the application of attention mechanisms in various vision tasks. Finally, we compare our multi-path recurrent network with other recurrent based works.
2.1. Salient Object Detection
Salient Object Detection methods can be categorized as conventional low-level hand-crafted features based [20, 4, 33, 15, 9, 38, 23, 21] and Convolutional Neural Networks driven [25, 13, 37, 14, 12, 16, 27, 18, 7, 35, 36, 28] approaches. Most of traditional saliency methods are based on low-level manually designed features, such as color, region contrast, etc. Detailed introductions of these methods can be found in recent survey paper [1]. In this paper, we put more emphasis on CNNs based approaches.
Funding
- This work was supported by the Natural Science Foundation of China under Grant 61725202 and 61472060
Study subjects and analysis
benchmark datasets: 6
Through multi-path recurrent connections, global semantic information from the top convolutional layer is transferred to shallower layers, which intrinsically refines the entire network. Experimental results on six benchmark datasets demonstrate that our algorithm performs favorably against the state-of-the-art approaches.
benchmark datasets: 6
Through multi-path recurrent connections, global semantic information from the top convolutional layer is transferred to shallower layers, which intrinsically refines the entire network. Experimental results on six benchmark datasets demonstrate that our algorithm performs favorably against the state-of-the-art approaches. Salient object detection, which simulates the human vision system to judge the importance of image regions, has received increasing attention in recent years
benchmark datasets: 6
Experimental Setup. Datasets: To evaluate the performance of our algorithm, we conduct experiments on six benchmark datasets: ECSSD [32], HKU-IS [13], THUR15K [3], PASCAL-S [17], DUT-OMRON [33] and DUTS (the testing dataset which contains 5019 images) [26]. Implementation Details: The proposed algorithm is based on Caffe [8]
benchmark datasets: 6
. MAE (lower is better) and F-measure (higher is better) comparisons with 13 methods on 6 benchmark datasets. The best three results are shown in red, green, and blue fonts respectively. Our algorithm ranks first on almost all datasets. Ablation analysis using F-measure, S-measure and MAE metrics. The results of the top two are shown in red and green
datasets: 4
Illustration of multi-path recurrent connections. PAG. The first row shows the P-R curve of the proposed method with other state-of-the-art methods. The second shows F-measure curves. The last show the average precision, recall, and F-measure scores across four datasets. The proposed method performs best among all datasets in terms of all metrics. Visual comparison between our results and state-of-the-art methods
Reference
- A. Borji, M. M. Cheng, H. Jiang, and J. Li. Salient object detection: A benchmark. TIP, 24(12):5706–5722, 2015. 2
- L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu, and T.-S. Chua. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In CVPR, June 2017. 2, 3
- M.-M. Cheng, N. J. Mitra, X. Huang, and S.-M. Hu. Salientshape: group saliency in image collections. The Visual Computer, 30:443–453, Apr 2014. 6
- M. M. Cheng, N. J. Mitra, X. Huang, P. H. S. Torr, and S. M. Hu. Global contrast based salient region detection. TPAMI, 37(3):569–582, 2015. 2
- X. Chu, W. Yang, W. Ouyang, C. Ma, A. L. Yuille, and X. Wang. Multi-context attention for human pose estimation. In CVPR, June 2017. 2, 3
- D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji. Structure-measure: A new way to evaluate foreground maps. In ICCV, 2017. 6
- Q. Hou, M. M. Cheng, X. Hu, A. Borji, Z. Tu, and P. Torr. Deeply supervised salient object detection with short connections. In CVPR, June 2012
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM, pages 675– 678, 2014. 6
- H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li. Salient object detection: A discriminative regional feature integration approach. In CVPR, June 2013. 2, 6
- X. Jin, Y. Chen, Z. Jie, J. Feng, and S. Yan. Multi-path feedback recurrent neural networks for scene parsing. In AAAI, pages 4096–4102, 2017. 5
- J. Kuen, Z. Wang, and G. Wang. Recurrent attentional networks for saliency detection. In CVPR, June 2016. 3
- G. Lee, Y.-W. Tai, and J. Kim. Deep saliency with encoded low level distance map and high level features. In CVPR, June 2016. 2, 6
- G. Li and Y. Yu. Visual saliency based on multiscale deep features. In CVPR, June 2015. 2, 6
- G. Li and Y. Yu. Deep contrast learning for salient object detection. In CVPR, June 2016. 2, 6
- X. Li, H. Lu, L. Zhang, X. Ruan, and M.-H. Yang. Saliency detection via dense and sparse reconstruction. In ICCV, December 2013. 2
- X. Li, L. Zhao, L. Wei, M. H. Yang, F. Wu, Y. Zhuang, H. Ling, and J. Wang. Deepsaliency: Multi-task deep neural network model for salient object detection. TIP, 25(8):3919– 3930, 202, 6
- Y. Li, X. Hou, C. Koch, J. M. Rehg, and A. L. Yuille. The secrets of salient object segmentation. In CVPR, June 2014. 6
- N. Liu and J. Han. Dhsnet: Deep hierarchical saliency network for salient object detection. In CVPR, June 2016. 2, 6
- J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, June 2015. 2
- F. Perazzi, P. Krahenbuhl, Y. Pritch, and A. Hornung. Saliency filters: Contrast based filtering for salient region detection. In CVPR, pages 733–740, 2012. 2
- Y. Qin, H. Lu, Y. Xu, and H. Wang. Saliency detection via cellular automata. In CVPR, June 2015. 2
- M. Simon, E. Rodner, and J. Denzler. Imagenet pre-trained models with batch normalization. 2016. 6
- N. Tong, H. Lu, X. Ruan, and M.-H. Yang. Salient object detection via bootstrap learning. In CVPR, June 2015. 2, 6
- F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang. Residual attention network for image classification. In CVPR, June 2017. 2, 3
- L. Wang, H. Lu, X. Ruan, and M.-H. Yang. Deep networks for saliency detection via local estimation and global search. In CVPR, June 2015. 2, 6
- L. Wang, H. Lu, Y. Wang, M. Feng, D. Wang, B. Yin, and X. Ruan. Learning to detect salient objects with image-level supervision. In CVPR, June 2017. 6
- L. Wang, L. Wang, H. Lu, P. Zhang, and X. Ruan. Saliency detection with recurrent fully convolutional networks. In ECCV, pages 825–841, 2016. 2, 3, 6
- T. Wang, A. Borji, L. Zhang, P. Zhang, and H. Lu. A stagewise refinement model for detecting salient objects in images. In ICCV, 2017. 2
- T. Wang, L. Zhang, H. Lu, C. Sun, and J. Qi. Kernelized subspace ranking for saliency detection. In ECCV, pages 450–466, 2016. 6
- H. Xu and K. Saenko. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In ECCV, pages 451–466, 2016. 2
- K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015. 2
- Q. Yan, L. Xu, J. Shi, and J. Jia. Hierarchical saliency detection. In CVPR, June 2013. 6
- C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang. Saliency detection via graph-based manifold ranking. In CVPR, June 2013. 2, 6
- Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. In CVPR, June 2016. 2
- P. Zhang, D. Wang, H. Lu, H. Wang, and X. Ruan. Amulet: Aggregating multi-level convolutional features for salient object detection. In ICCV, 2017. 2, 6
- P. Zhang, D. Wang, H. Lu, H. Wang, and B. Yin. Learning uncertain convolutional features for accurate saliency detection. In ICCV, 2017. 2, 6
- R. Zhao, W. Ouyang, H. Li, and X. Wang. Saliency detection by multi-context deep learning. In CVPR, June 2015. 2, 6
- W. Zhu, S. Liang, Y. Wei, and J. Sun. Saliency optimization from robust background detection. In CVPR, June 2014. 2
Full Text
Tags
Comments