Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

ICLR 2021, 2021.

Cited by: 0|Bibtex|Views42
Other Links: arxiv.org
Weibo:
Auto Seg-Loss is the first general framework for searching surrogate losses for mainstream semantic segmentation metrics.

Abstract:

We propose a general framework for searching surrogate losses for mainstream semantic segmentation metrics. This is in contrast to existing loss functions manually designed for individual metrics. The searched surrogate losses can generalize well to other datasets and networks. Extensive experiments on PASCAL VOC and Cityscapes demonstr...More

Code:

Data:

0
Introduction
  • Loss functions are of indispensable components in training deep networks, as they drive the feature learning process for various applications with specific evaluation metrics.
  • The cross-entropy loss serves well as an effective surrogate objective function for a variety of tasks concerning categorization.
  • This phenomenon is especially prevailing in image semantic segmentation, where various evaluation metrics have been designed to address the diverse task focusing on different scenarios.
  • Cross-entropy and its variants work well for many metrics, the mis-alignment between network training and evaluation still exist and inevitably leads to performance degradation
Highlights
  • Loss functions are of indispensable components in training deep networks, as they drive the feature learning process for various applications with specific evaluation metrics
  • The cross-entropy loss serves well as an effective surrogate objective function for a variety of tasks concerning categorization. This phenomenon is especially prevailing in image semantic segmentation, where various evaluation metrics have been designed to address the diverse task focusing on different scenarios
  • The results demonstrate that our searched loss functions can be applied to various semantic segmentation networks
  • The introduced Auto Seg-Loss is a powerful framework to search for the parameterized surrogate losses for mainstream segmentation evalutation metrics
  • It would be interesting to extend the framework to more tasks, like object detection, pose estimation and machine translation problems
Methods
  • The authors evaluate on the PASCAL VOC 2012 (Everingham et al, 2015) and the Cityscapes (Cordts et al, 2016) datasets.
  • The authors use Deeplabv3+ (Chen et al, 2018) with ResNet-50/101 (He et al, 2016) as the network model.
  • During the surrogate parameter search, the authors randomly sample 1500 training images in PASCAL VOC and 500 training images in Cityscapes to form the hold-out set Shold-out, respectively.
  • The remaining training images form the training set Strain in search.
  • The authors re-train the segmentation networks with ResNet-101 using the searched losses on the full training set and evaluate them on the actual validation set.
Conclusion
  • The introduced Auto Seg-Loss is a powerful framework to search for the parameterized surrogate losses for mainstream segmentation evalutation metrics.
  • The non-differentiable operators are substituted by their parameterized continuous counterparts.
  • The parameters are optimized to improve the final evaluation metrics with essential constraints.
  • It would be interesting to extend the framework to more tasks, like object detection, pose estimation and machine translation problems
Summary
  • Introduction:

    Loss functions are of indispensable components in training deep networks, as they drive the feature learning process for various applications with specific evaluation metrics.
  • The cross-entropy loss serves well as an effective surrogate objective function for a variety of tasks concerning categorization.
  • This phenomenon is especially prevailing in image semantic segmentation, where various evaluation metrics have been designed to address the diverse task focusing on different scenarios.
  • Cross-entropy and its variants work well for many metrics, the mis-alignment between network training and evaluation still exist and inevitably leads to performance degradation
  • Objectives:

    In the t-th step, the authors aim to explore the search space around that from t − 1.
  • Methods:

    The authors evaluate on the PASCAL VOC 2012 (Everingham et al, 2015) and the Cityscapes (Cordts et al, 2016) datasets.
  • The authors use Deeplabv3+ (Chen et al, 2018) with ResNet-50/101 (He et al, 2016) as the network model.
  • During the surrogate parameter search, the authors randomly sample 1500 training images in PASCAL VOC and 500 training images in Cityscapes to form the hold-out set Shold-out, respectively.
  • The remaining training images form the training set Strain in search.
  • The authors re-train the segmentation networks with ResNet-101 using the searched losses on the full training set and evaluate them on the actual validation set.
  • Conclusion:

    The introduced Auto Seg-Loss is a powerful framework to search for the parameterized surrogate losses for mainstream segmentation evalutation metrics.
  • The non-differentiable operators are substituted by their parameterized continuous counterparts.
  • The parameters are optimized to improve the final evaluation metrics with essential constraints.
  • It would be interesting to extend the framework to more tasks, like object detection, pose estimation and machine translation problems
Tables
  • Table1: Revisiting mainstream metrics for semantic segmentation. The metrics with † measure the segmentation accuracy on the whole image. The metrics with ∗ focus on the boundary quality
  • Table2: Performance of different losses on PASCAL VOC and Cityscapes segmentation. The results of each loss function’s target metrics are underlined. The scores whose difference with the highest is less than 0.3 are marked in bold
  • Table3: Generalization of our searched surrogate losses between PASCAL VOC and Cityscapes
  • Table4: Generalization of our searched surrogate losses among different network architectures on PASCAL VOC. The losses are searched with ResNet-50 + DeepLabv3+ on PASCAL VOC
  • Table5: Ablation on search space constraints
  • Table6: Ablation on search proxy tasks
Download tables as Excel
Related work
  • Loss function design is an active topic in deep network training (Ma, 2020). In the area of image semantic segmentation, cross-entropy loss is widely used (Ronneberger et al, 2015; Chen et al, 2018). But the cross-entropy loss is designed for optimizing the global accuracy measure (Rahman & Wang, 2016; Patel et al, 2020), which is not aligned with many other metrics. Numerous studies are conducted to design proper loss functions for the prevalent evaluation metrics. For the mIoU metric, many works (Ronneberger et al, 2015; Wu et al, 2016) incorporate class frequency to mitigate the class imbalance problem. For the boundary F1 score, the losses at boundary regions are up-weighted (Caliva et al, 2019; Qin et al, 2019), so as to deliver more accurate boundaries. These works carefully analyze the property of specific evaluation metrics, and design the loss functions in a fully handcrafted way, which needs expertise. By contrast, we propose a unified framework for deriving parameterized surrogate losses for various evaluation metrics. Wherein, the parameters are searched by reinforcement learning in an automatic way. The networks trained with the searched surrogate losses deliver accuracy on par or even superior than those with the best handcrafted losses.
Reference
  • Maxim Berman, Amal Rannen Triki, and Matthew B Blaschko. The lovasz-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4413– 4421, 2018.
    Google ScholarLocate open access versionFindings
  • John Burkardt. The truncated normal distribution. Department of Scientific Computing Website, Florida State University, pp. 1–35, 2014.
    Google ScholarLocate open access versionFindings
  • Francesco Caliva, Claudia Iriondo, Alejandro Morales Martinez, Sharmila Majumdar, and Valentina Pedoia. Distance map loss penalty term for semantic segmentation. In International Conference on Medical Imaging with Deep Learning–Extended Abstract Track, 2019.
    Google ScholarLocate open access versionFindings
  • Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834–848, 2017.
    Google ScholarLocate open access versionFindings
  • Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoderdecoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818, 2018.
    Google ScholarLocate open access versionFindings
  • Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223, 2016.
    Google ScholarLocate open access versionFindings
  • Gabriela Csurka, Diane Larlus, Florent Perronnin, and France Meylan. What is a good evaluation measure for semantic segmentation? In Proceedings of the British Machine Vision Conference (BMVC), volume 27, pp. 2013, 2013.
    Google ScholarLocate open access versionFindings
  • Mark Everingham, SM Ali Eslami, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1):98–136, 2015.
    Google ScholarLocate open access versionFindings
  • Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, and Jitendra Malik. Semantic contours from inverse detectors. In Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 991–998, 2011.
    Google ScholarLocate open access versionFindings
  • Tamir Hazan, Joseph Keshet, and David A McAllester. Direct loss minimization for structured prediction. In Advances in Neural Information Processing Systems (NIPS), pp. 1594–1602, 2010.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • Xin He, Kaiyong Zhao, and Xiaowen Chu. Automl: A survey of the state-of-the-art. arXiv preprint arXiv:1908.00709, 2019.
    Findings
  • Thorsten Joachims. A support vector method for multivariate performance measures. In Proceedings of the 22nd International Conference on Machine Learning (ICML), pp. 377–384. PMLR, 2005.
    Google ScholarLocate open access versionFindings
  • Pushmeet Kohli, Philip HS Torr, et al. Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision, 82(3):302–324, 2009.
    Google ScholarLocate open access versionFindings
  • Chuming Li, Xin Yuan, Chen Lin, Minghao Guo, Wei Wu, Junjie Yan, and Wanli Ouyang. Amlfs: Automl for loss function search. In Proceedings of the IEEE International Conference on Computer Vision (CVPR), pp. 8410–8419, 2019.
    Google ScholarLocate open access versionFindings
  • Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. In Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Jun Ma. Segmentation loss odyssey. arXiv preprint arXiv:2005.13449, 2020.
    Findings
  • Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • Pritish Mohapatra, Michal Rolinek, CV Jawahar, Vladimir Kolmogorov, and M Pawan Kumar. Efficient optimization for rank-based loss functions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3693–3701, 2018.
    Google ScholarLocate open access versionFindings
  • Gattigorla Nagendar, Digvijay Singh, Vineeth N Balasubramanian, and CV Jawahar. Neuro-iou: Learning a surrogate loss for semantic segmentation. In Proceedings of the British Machine Vision Conference (BMVC), pp. 278, 2018.
    Google ScholarLocate open access versionFindings
  • Yash Patel, Tomas Hodan, and Jiri Matas. Learning surrogates via deep embedding. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
    Google ScholarLocate open access versionFindings
  • Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. Efficient neural architecture search via parameters sharing. In Proceedings of the 35th International Conference on Machine Learning (ICML), pp. 4095–4104. PMLR, 2018.
    Google ScholarLocate open access versionFindings
  • Xuebin Qin, Zichen Zhang, Chenyang Huang, Chao Gao, Masood Dehghan, and Martin Jagersand. Basnet: Boundary-aware salient object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7479–7489, 2019.
    Google ScholarLocate open access versionFindings
  • Mani Ranjbar, Tian Lan, Yang Wang, Steven N Robinovitch, Ze-Nian Li, and Greg Mori. Optimizing nondecomposable loss functions in structured prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4):911–924, 2012.
    Google ScholarLocate open access versionFindings
  • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
    Findings
  • Yang Song, Alexander Schwing, Raquel Urtasun, et al. Training deep neural networks via direct loss minimization. In Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 2169–2177. PMLR, 2016.
    Google ScholarLocate open access versionFindings
  • Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. Deep high-resolution representation learning for human pose estimation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5693–5703, 2019.
    Google ScholarLocate open access versionFindings
  • Xiaobo Wang, Shuo Wang, Cheng Chi, Shifeng Zhang, and Tao Mei. Loss function search for face recognition. In Proceedings of the 37th International Conference on Machine Learning (ICML). PMLR, 2020.
    Google ScholarLocate open access versionFindings
  • Zifeng Wu, Chunhua Shen, and Anton van den Hengel. Bridging category-level and instance-level semantic image segmentation. arXiv preprint arXiv:1605.06885, 2016.
    Findings
  • Yisong Yue, Thomas Finley, Filip Radlinski, and Thorsten Joachims. A support vector method for optimizing average precision. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 271–278, 2007.
    Google ScholarLocate open access versionFindings
  • Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890, 2017.
    Google ScholarLocate open access versionFindings
  • Barret Zoph and Quoc V. Le. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments