AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
The Spatial Path is designed to preserve the spatial information from original images

Bisenet: Bilateral Segmentation Network For Real-Time Semantic Segmentation

COMPUTER VISION - ECCV 2018, PT XIII, (2018): 334-349

Cited: 686|Views322
EI

Abstract

Semantic segmentation requires both rich spatial information and sizeable receptive field. However, modern approaches usually compromise spatial resolution to achieve real-time inference speed, which leads to poor performance. In this paper, we address this dilemma with a novel Bilateral Segmentation Network (BiSeNet). We first design a S...More

Code:

Data:

0
Introduction
  • The research of semantic segmentation, which amounts to assign semantic labels to each pixel, is a fundamental task in computer vision.
  • 2) Instead of resizing the input image, some works prune the channels of the network to boost the inference speed [1, 8, 25], especially in the early stages of the base model.
  • It weakens the spatial capacity.
Highlights
  • The research of semantic segmentation, which amounts to assign semantic labels to each pixel, is a fundamental task in computer vision
  • Based on the above observation, we propose the Bilateral Segmentation Network (BiSeNet) with two parts: Spatial Path (SP) and Context Path (CP)
  • – We propose a novel approach to decouple the function of spatial information preservation and receptive field offering into two paths
  • We propose a Spatial Path to preserve the spatial size of the original input image and encode affluent spatial information
  • With the Spatial Path and the Context Path, we propose BiSeNet for real-time semantic segmentation as illustrated in Figure 2(a)
  • With the affluent spatial details and large receptive field, we achieve the result of 68.4% Mean IOU on Cityscapes [9] test dataset at 105 FPS
  • The Spatial Path is designed to preserve the spatial information from original images
Methods
  • Method BaseModel FLOPS Parameters Mean

    IOU(%)

    FCN-32s Xception39 185.5M FCN-32s Res18.
  • Baseline: The authors use the Xception39 network pretrained on ImageNet dataset [28] as the backbone of Context Path.
  • The authors evaluate the performance of the base model as the baseline, as shown in Table 1.
  • Where the authors use a lightweight model, Xception39, as the backbone of Context Path to down-sample quickly.
  • The authors use the U-shape-8s structure, which improves the performance from 60.79% to 66.01%, as shown in Table 2.
  • The authors don’t adopt the multi-scale testing
Results
  • The authors adopt a modified Xception model [8], Xception39, into the real-time semantic segmentation task.
  • The authors' implementation code will be made publicly available.
  • The authors evaluate the proposed BiSeNet on Cityscapes [9], CamVid [2] and COCOStuff [3] benchmarks.
  • The authors first introduce the datasets and the implementation protocol.
  • The authors describe the speed strategy in comparison with other methods in detail.
  • The authors evaluate all performance results on the Cityscapes validation set.
  • The authors report the accuracy and speed results on Cityscapes, CamVid and
Conclusion
  • Bilateral Segmentation Network (BiSeNet) is proposed in this paper to improve the speed and accuracy of real-time semantic segmentation simultaneously.
  • The authors' proposed BiSeNet contains two paths: Spatial Path (SP) and Context Path (CP).
  • The Context Path utilizes the lightweight model and global average pooling [6, 21, 40] to obtain sizeable receptive field rapidly.
  • With the affluent spatial details and large receptive field, the authors achieve the result of 68.4% Mean IOU on Cityscapes [9] test dataset at 105 FPS
Tables
  • Table1: Accuracy and parameter analysis of our baseline model: Xception39 and Res18 on Cityscapes validation dataset. Here we use FCN-32s as the base structure. FLOPS are estimated for input of 3 × 640 × 360
  • Table2: Speed analysis of the U-shape-8s and the U-shape-4s on one NVIDIA Titan XP card. Image size is W×H
  • Table3: Detailed performance comparison of each component in our proposed BiSeNet. CP: Context Path; SP: Spatial Path; GP: global average pooling; ARM: Attention Refinement Module; FFM: Feature Fusion Module
  • Table4: Table 4
  • Table5: Speed comparison of our method against other state-of-the-art methods. Image size is W×H. The Ours1 and Ours2 are the BiSeNet based on Xception39 and
  • Table6: Accuracy and speed comparison of our method against other state-of-theart methods on Cityscapes test dataset. We train and evaluate on NVIDIA Titan XP with 2048×1024 resolution input. “-” indicates that the methods didn’t give the corresponding speed result of the accuracy
  • Table7: Accuracy comparison of our method against other state-of-the-art methods on Cityscapes test dataset. “-” indicates that the methods didn’t give the corresponding result
  • Table8: Accuracy result on CamVid test dataset. Ours1 and Ours2 indicate the model based on Xception39 and Res18 network
  • Table9: Accuracy result on COCO-Stuff validation dataset
Download tables as Excel
Related work
  • Recently, lots of approaches based on FCN [22] have achieved the state-of-the-art performance on different benchmarks of the semantic segmentation task. Most of these methods are designed to encode more spatial information or enlarge the receptive field.

    Spatial information: The convolutional neural network (CNN) [16] encodes high-level semantic information with consecutive down-sampling operations. However, in the semantic segmentation task, the spatial information of the image is crucial to predicting the detailed output. Modern existing approaches devote to encode affluent spatial information. DUC [32], PSPNet [40], DeepLab v2 [5], and Deeplab v3 [6] use the dilated convolution to preserve the spatial size of the feature map. Global Convolution Network [26] utilizes the “large kernel” to enlarge the receptive field.
Funding
  • This work was supported by the Project of the National Natural Science Foundation of China No.61433007 and No.61401170
Reference
  • Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(12), 2481–2495 (2017) 2, 4, 5, 6, 9, 11, 12
    Google ScholarLocate open access versionFindings
  • Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: European Conference on Computer Vision. pp. 44–57 (2008) 3, 7, 8, 12
    Google ScholarLocate open access versionFindings
  • Caesar, H., Uijlings, J., Ferrari, V.: Coco-stuff: Thing and stuff classes in context. In: IEEE Conference on Computer Vision and Pattern Recognition (2018) 3, 7, 8, 12
    Google ScholarLocate open access versionFindings
  • Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected crfs. ICLR (2015) 13
    Google ScholarLocate open access versionFindings
  • Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv (2016) 3, 4, 5, 6, 8, 13
    Google ScholarLocate open access versionFindings
  • Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv (2017) 3, 4, 5, 6, 8, 14
    Google ScholarFindings
  • Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scaleaware semantic image segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2016) 4
    Google ScholarLocate open access versionFindings
  • Chollet, F.: Xception: Deep learning with depthwise separable convolutions. IEEE Conference on Computer Vision and Pattern Recognition (2017) 2, 3, 6, 7
    Google ScholarLocate open access versionFindings
  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (2016) 3, 7, 8, 12, 14
    Google ScholarLocate open access versionFindings
  • Ghiasi, G., Fowlkes, C.C.: Laplacian pyramid reconstruction and refinement for semantic segmentation. In: European Conference on Computer Vision (2016) 4, 13
    Google ScholarLocate open access versionFindings
  • Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics. pp. 315–323 (2011) 5, 9, 10
    Google ScholarLocate open access versionFindings
  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016) 12
    Google ScholarLocate open access versionFindings
  • Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv (2017) 4, 7 14. Iandola, F.N., Moskewicz, M.W., Ashraf, K., Han, S., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡1mb model size. arXiv abs/1602.07360 (2016) 12 15.
    Findings
  • Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. pp. 448–456 (2015) 5, 7, 9, 10 16.
    Google ScholarLocate open access versionFindings
  • Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems (2012) 3, 8 17.
    Google ScholarLocate open access versionFindings
  • Li, X., Liu, Z., Luo, P., Loy, C.C., Tang, X.: Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. IEEE Conference on Computer Vision and Pattern Recognition (2017) 2, 4, 12
    Google ScholarLocate open access versionFindings
  • 18. Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: Multi-path refinement networks with identity mappings for high-resolution semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (2017) 4, 13
    Google ScholarLocate open access versionFindings
  • 19. Lin, G., Shen, C., van den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2016) 13
    Google ScholarLocate open access versionFindings
  • 20. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision. Springer (2014) 8
    Google ScholarLocate open access versionFindings
  • 21. Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: Looking wider to see better. ICLR (2016) 6, 8, 11, 14
    Google ScholarLocate open access versionFindings
  • 22. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2015) 3, 4, 9, 13
    Google ScholarLocate open access versionFindings
  • 23. Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Neural Information Processing Systems (2014) 4
    Google ScholarLocate open access versionFindings
  • 24. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: IEEE International Conference on Computer Vision (2015) 4
    Google ScholarLocate open access versionFindings
  • 25. Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: A deep neural network architecture for real-time semantic segmentation. arXiv (2016) 2, 4, 5, 6, 9, 11, 12
    Google ScholarFindings
  • 26. Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters–improve semantic segmentation by global convolutional network. IEEE Conference on Computer Vision and Pattern Recognition (2017) 3, 4, 5, 6
    Google ScholarLocate open access versionFindings
  • 27. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (2015) 4
    Google ScholarLocate open access versionFindings
  • 28. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y 9
    Locate open access versionFindings
  • 29. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ICLR (2015) 11, 13
    Google ScholarLocate open access versionFindings
  • 30. Treml, M., Arjona-Medina, J., Unterthiner, T., Durgesh, R., Friedmann, F., Schuberth, P., Mayr, A., Heusel, M., Hofmarcher, M., Widrich, M., et al.: Speeding up semantic segmentation for autonomous driving. In: Neural Information Processing Systems Workshop (2016) 12
    Google ScholarLocate open access versionFindings
  • 31. Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. IEEE Conference on Computer Vision and Pattern Recognition (2017) 4
    Google ScholarLocate open access versionFindings
  • 32. Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., Cottrell, G.: Understanding convolution for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (2017) 3, 4, 5, 13
    Google ScholarLocate open access versionFindings
  • 33. Wu, Z., Shen, C., Hengel, A.v.d.: High-performance semantic segmentation using very deep fully convolutional networks. arXiv preprint arXiv:1604.04339 (2016) 12
    Findings
  • 34. Wu, Z., Shen, C., Hengel, A.v.d.: Real-time semantic image segmentation via spatial sparsity. arXiv (2017) 2, 4, 12
    Google ScholarFindings
  • 35. Xie, S., Tu, Z.: Holistically-nested edge detection. In: IEEE International Conference on Computer Vision (2015) 2, 6, 7, 9
    Google ScholarLocate open access versionFindings
  • 36. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2018) 4, 11
    Google ScholarLocate open access versionFindings
  • 37. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. ICLR (2016) 4, 13
    Google ScholarLocate open access versionFindings
  • 38. Zhang, R., Tang, S., Zhang, Y., Li, J., Yan, S.: Scale-adaptive convolutions for scene parsing. In: IEEE International Conference on Computer Vision. pp. 2031– 2039 (2017) 4
    Google ScholarLocate open access versionFindings
  • 39. Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: Icnet for real-time semantic segmentation on high-resolution images. arXiv (2017) 2, 4, 5, 12
    Google ScholarFindings
  • 40. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. IEEE Conference on Computer Vision and Pattern Recognition (2017) 3, 4, 5, 6, 12, 13, 14
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
avatar
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn