AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We designed an upsampling path mitigating the linear growth of feature maps that would appear in a naive extension of DenseNets

The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation.

IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, no. 1 (2017): 1175-1183

Cited: 1416|Views161
EI

Abstract

State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of t...More

Code:

Data:

0
Introduction
  • Convolutional Neural Networks (CNNs) are driving major advances in many computer vision tasks, such as image classification [29], object detection [25, 24] and semantic image segmentation [20].
  • Convolutional Networks (FCNs) [20, 27] were introduced in the literature as a natural extension of CNNs to tackle per pixel prediction problems such as semantic image segmentation.
  • FCNs add upsampling layers to standard CNNs to recover the spatial resolution of the input at the output layer.
  • In order to compensate for the resolution loss induced by pooling layers, FCNs introduce skip connections between their downsampling and upsampling paths.
  • Skip connections help the upsampling path recover fine-grained information from the downsampling layers
Highlights
  • Convolutional Neural Networks (CNNs) are driving major advances in many computer vision tasks, such as image classification [29], object detection [25, 24] and semantic image segmentation [20]
  • Convolutional Networks (FCNs) [20, 27] were introduced in the literature as a natural extension of Convolutional Neural Networks to tackle per pixel prediction problems such as semantic image segmentation
  • Our fully convolutional DenseNet implicitly inherits the advantages of DenseNets, namely: (1) parameter efficiency, as our network has substantially less parameters than other segmentation architectures published for the Camvid dataset; (2) implicit deep supervision, we tried including additional levels of supervision to different layers of our network without noticeable change in performance; and (3) feature reuse, as all layers can access their preceding layers due to the iterative concatenation of feature maps in a dense block and thanks to skip connections that enforce connectivity between downsampling and upsampling path
  • We have extended DenseNets and made them fully convolutional to tackle the problem semantic image segmentation
  • The main idea behind DenseNets is captured in dense blocks that perform iterative concatenation of feature maps
  • We designed an upsampling path mitigating the linear growth of feature maps that would appear in a naive extension of DenseNets
Methods
  • The authors evaluate the method on two urban scene understanding datasets: CamVid [2], and Gatech [22].
  • The authors trained the models from scratch without using any extra-data nor postprocessing module.
  • The authors report the results using the Intersection over Union (IoU) metric and the global accuracy.
  • For a given class c, predictions and targets, the IoU is defined by IoU (c) = i , i (4).
  • The authors compute IoU by summing over all the pixels i of the dataset.
Results
  • The authors achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining.
  • The authors show that such a network can outperform current state-of-the-art results on standard benchmarks for urban scene understanding without neither using pretrained parameters nor any further post-processing.
  • The authors' model is able to outperform such state-of-the-art model, without requiring any temporal smoothing
Conclusion
  • The authors' fully convolutional DenseNet implicitly inherits the advantages of DenseNets, namely: (1) parameter efficiency, as the network has substantially less parameters than other segmentation architectures published for the Camvid dataset; (2) implicit deep supervision, the authors tried including additional levels of supervision to different layers of the network without noticeable change in performance; and (3) feature reuse, as all layers can access their preceding layers due to the iterative concatenation of feature maps in a dense block and thanks to skip connections that enforce connectivity between downsampling and upsampling path.

    Recent evidence suggest that ResNets behave like ensemble of relatively shallow networks [35]: ”Residual networks avoid the vanishing gradient problem by introducing short paths which can carry gradient throughout the extent of very deep networks”.
  • Thanks to the smart connectivity patterns, FC-DenseNets might represent an ensemble of variable depth networks
  • This particular ensemble behavior would be very interesting for semantic segmentation models, where the ensemble of different paths throughout the model would capture the multi-scale appearance of objects in urban scene.In this paper, the authors have extended DenseNets and made them fully convolutional to tackle the problem semantic image segmentation.
Tables
  • Table1: Building blocks of fully convolutional DenseNets. From left to right: layer used in the model, Transition Down (TD) and Transition Up (TU). See text for details
  • Table2: Architecture details of FC-DenseNet103 model used in our experiments. This model is built from 103 convolutional layers. In the Table we use following notations: DB stands for Dense Block, TD stands for Transition Down, TU stands for Transition Up, BN stands for Batch Normalization and m corresponds to the total number of feature maps at the end of a block. c stands for the number of classes
  • Table3: Results on CamVid dataset. Note that we trained our own pretrained FCN8 model
  • Table4: Results on Gatech dataset in network parallelization, and Tristan Sylvain. We acknowledge the support of the following agencies for research funding and computing support: Imagia Inc., Spanish projects TRA2014-57088-C2-1-R & 2014-SGR-1506, TECNIOspring-FP7-ACCI grant
Download tables as Excel
Related work
  • Recent advances in semantic segmentation have been devoted to improve architectural designs by (1) improving the upsampling path and increasing the connectivity within FCNs [27, 1, 21, 8]; (2) introducing modules to account for broader context understanding [36, 5, 37]; and/or (3) endowing FCN architectures with the ability to provide structured outputs [16, 5, 38].

    First, different alternatives have been proposed in the literature to address the resolution recovery in FCN’s upsampling path; from simple bilinear interpolation [10, 20, 1] to more sophisticated operators such as unpooling [1, 21] or transposed convolutions [20]. Skip connections from the downsampling to the upsampling path have also been adopted to allow for a finer information recovery [27]. More recently, [8] presented a thorough analysis on the combination of identity mapping [11] and long skip connections [27] for semantic segmentation.

    Second, approaches that introduce larger context to semantic segmentation networks include [10, 36, 5, 37]. In [10], an unsupervised global image descriptor is computed added to the feature maps for each pixel. In [36], Recurrent Neural Networks (RNNs) are used to retrieve contextual information by sweeping the image horizontally and vertically in both directions. In [5], dilated convolutions are introduced as an alternative to late CNN pooling layers to capture larger context without reducing the image resolution. Following the same spirit, [37] propose to provide FCNs with a context module built as a stack of dilated convolutional layers to enlarge the field of view of the network.
Funding
  • We acknowledge the support of the following agencies for research funding and computing support: Imagia Inc., Spanish projects TRA2014-57088-C2-1-R & 2014-SGR-1506, TECNIOspring-FP7-ACCI grant
Reference
  • V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. CoRR, abs/1511.00561, 2015.
    Findings
  • G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla. Segmentation and recognition using structure from motion point clouds. In European Conference on Computer Vision (ECCV), 2008.
    Google ScholarLocate open access versionFindings
  • L. Castrejon, Y. Aytar, C. Vondrick, H. Pirsiavash, and A. Torralba. Learning aligned cross-modal representations from weakly aligned data. CoRR, abs/1607.07295, 2016.
    Findings
  • L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. CoRR, abs/1606.00915, 2016.
    Findings
  • L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. In International Conference of Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
    Google ScholarLocate open access versionFindings
  • S. Dieleman, J. Schlter, C. Raffel, E. Olson, and et al. Lasagne: First release., Aug. 2015.
    Google ScholarFindings
  • M. Drozdzal, E. Vorontsov, G. Chartrand, S. Kadoury, and C. Pal. The importance of skip connections in biomedical image segmentation. CoRR, abs/1608.04117, 2016.
    Findings
  • A. R. F. Visin. Dataset loaders: a python library to load and preprocess datasets. https://github.com/fvisin/dataset_loaders, 2017.
    Findings
  • C. Gatta, A. Romero, and J. van de Weijer. Unrolling loopy top-down semantic feedback in convolutional deep networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) workshop, 2014.
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
    Findings
  • K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. CoRR, abs/1502.01852, 2015.
    Findings
  • G. Huang, Z. Liu, and K. Q. Weinberger. Densely connected convolutional networks. CoRR, abs/1608.06993, 2016.
    Findings
  • S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167, 2015.
    Findings
  • A. Kendall, V. Badrinarayanan, and R. Cipolla. Bayesian segnet: Model uncertainty in deep convolutional encoderdecoder architectures for scene understanding. CoRR, abs/1511.02680, 2015.
    Findings
  • P. Krahenbuhl and V. Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. In Advances in Neural Information Processing Systems (NIPS). 2011.
    Google ScholarLocate open access versionFindings
  • A. Kundu, V. Vineet, and V. Koltun. Feature space optimization for semantic video segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • C. Lee, S. Xie, P. W. Gallagher, Z. Zhang, and Z. Tu. Deeplysupervised nets. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2015.
    Google ScholarLocate open access versionFindings
  • T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollr, and C. L. Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV), 2014.
    Google ScholarLocate open access versionFindings
  • J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    Google ScholarLocate open access versionFindings
  • H. Noh, S. Hong, and B. Han. Learning deconvolution network for semantic segmentation. arXiv preprint arXiv:1505.04366, 2015.
    Findings
  • S. H. Raza, M. Grundmann, and I. Essa. Geometric context from video. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
    Google ScholarLocate open access versionFindings
  • S. H. Raza, M. Grundmann, and I. Essa. Geometric context from video. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
    Google ScholarLocate open access versionFindings
  • J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. CoRR, abs/1506.02640, 2015.
    Findings
  • S. Ren, K. He, R. B. Girshick, and J. Sun. Faster R-CNN: towards real-time object detection with region proposal networks. CoRR, abs/1506.01497, 2015.
    Findings
  • S. R. Richter, V. Vineet, S. Roth, and V. Koltun. Playing for data: Ground truth from computer games. In European Conference on Computer Vision (ECCV), 2016.
    Google ScholarLocate open access versionFindings
  • O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICAI), 2015.
    Google ScholarLocate open access versionFindings
  • G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
    Findings
  • N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929–1958, 2014.
    Google ScholarLocate open access versionFindings
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CoRR, abs/1409.4842, 2014.
    Findings
  • Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv eprints, abs/1605.02688, May 2016.
    Findings
  • T. Tieleman and G. Hinton. rmsprop adaptive learning. In COURSERA: Neural Networks for Machine Learning, 2012.
    Google ScholarLocate open access versionFindings
  • D. Tran, L. D. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Deep end2end voxel2voxel prediction. CoRR, abs/1511.06681, 2015.
    Findings
  • A. Veit, M. J. Wilber, and S. J. Belongie. Residual networks are exponential ensembles of relatively shallow networks. CoRR, abs/1605.06431, 2016.
    Findings
  • F. Visin, M. Ciccone, A. Romero, K. Kastner, K. Cho, Y. Bengio, M. Matteucci, and A. Courville. Reseg: A recurrent neural network-based model for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) workshop, 2016.
    Google ScholarLocate open access versionFindings
  • F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. In International Conference of Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. Torr. Conditional random fields as recurrent neural networks. In International Conference on Computer Vision (ICCV), 2015.
    Google ScholarLocate open access versionFindings
Author
Simon Jégou
Simon Jégou
Michal Drozdzal
Michal Drozdzal
0
Your rating :

No Ratings

Tags
Comments
avatar
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn