Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation

ICLR, 2020.

Cited by: 3|Bibtex|Views100
EI
Other Links: academic.microsoft.com|dblp.uni-trier.de|arxiv.org
Weibo:
We introduced Deformable Kernels to adapt effective receptive fields of convolutional networks for object deformation

Abstract:

Convolutional networks are not aware of an object's geometric variations, which leads to inefficient utilization of model and data capacity. To overcome this issue, recent works on deformation modeling seek to spatially reconfigure the data towards a common arrangement such that semantic recognition suffers less from deformation. This is ...More
Introduction
  • The rich diversity of object appearance in images arises from variations in object semantics and deformation.
  • Modern convolutional networks follow an analogous process by making abstractions through local connectivity and weight sharing (Zhang, 2019).
  • Such a mechanism is an inefficient one, as the emergent representations encode semantics and deformation together, instead of as disjoint notions.
  • This consumes large model capacity and data modes (Shelhamer et al, 2019)
Highlights
  • The rich diversity of object appearance in images arises from variations in object semantics and deformation
  • Since adapting the theoretical receptive field is not the goal but a means to adapt the effective receptive field, why not directly tune the effective receptive field to specific data and tasks at runtime? Toward this end, we introduce Deformable Kernels (DKs), a family of novel and generic convolutional operators for deformation modeling
  • We define our ResNet-50-DW base model by replacing all 3 × 3 convolutions by its depthwise counterpart while doubling the dimension of intermediate channels in all residual blocks. We find it to be a reasonable base model compared to the original ResNet-50, with comparable performance on both tasks
  • Adaptation of Effective Receptive Fields: To verify our claim that Deformable Kernels adapts effective receptive field in practice, we show effective receptive field visualizations on a set of images in which they display different degrees of deformations
  • We introduced Deformable Kernels (DKs) to adapt effective receptive fields (ERFs) of convolutional networks for object deformation
  • We proposed to sample kernel values from the original kernel space
Methods
  • We evaluate our Deformable Kernels (DKs) on image classification using ILSVRC and object detection using the COCO benchmark.
  • For the kernel offset generator, we set its learning rate to be a fraction of that of the main network, which we cross-validate for each base model.
  • We find it important to clip sampling locations inside the original kernel space, such that k + ∆k ∈ K in Equation 7
Conclusion
  • We introduced Deformable Kernels (DKs) to adapt effective receptive fields (ERFs) of convolutional networks for object deformation.
  • We proposed to sample kernel values from the original kernel space.
  • This in effect samples the ERF in linear networks and roughly generalizes to non-linear cases.
  • We instantiated two variants of DKs and validate our designs, showing connections to previous works.
  • Consistent improvements over them and compatibility with them were found, as illustrated in visualizations
Summary
  • Introduction:

    The rich diversity of object appearance in images arises from variations in object semantics and deformation.
  • Modern convolutional networks follow an analogous process by making abstractions through local connectivity and weight sharing (Zhang, 2019).
  • Such a mechanism is an inefficient one, as the emergent representations encode semantics and deformation together, instead of as disjoint notions.
  • This consumes large model capacity and data modes (Shelhamer et al, 2019)
  • Methods:

    We evaluate our Deformable Kernels (DKs) on image classification using ILSVRC and object detection using the COCO benchmark.
  • For the kernel offset generator, we set its learning rate to be a fraction of that of the main network, which we cross-validate for each base model.
  • We find it important to clip sampling locations inside the original kernel space, such that k + ∆k ∈ K in Equation 7
  • Conclusion:

    We introduced Deformable Kernels (DKs) to adapt effective receptive fields (ERFs) of convolutional networks for object deformation.
  • We proposed to sample kernel values from the original kernel space.
  • This in effect samples the ERF in linear networks and roughly generalizes to non-linear cases.
  • We instantiated two variants of DKs and validate our designs, showing connections to previous works.
  • Consistent improvements over them and compatibility with them were found, as illustrated in visualizations
Tables
  • Table1: Ablations of scope size and different instantiations of DK for image classification. Using proper scope size, and more DK layers boosts performance. Modeling individual offset kernel grid for each data entries is also beneficial
  • Table2: Comparisons to strong baselines for image classification DKs perform comparably or superiorly to previous methods. Further combinations yield consistent gain, suggesting orthogonal and compatible working mechanisms
  • Table3: Ablations for object detection. Consistent results with image classification
  • Table4: Comparisons to strong baselines for object detection DKs perform fall short to Deformable Convolution, but combination still improves performance
  • Table5: Network architecture of our ResNet-50-DW comparing to the original ResNet-50 Inside the brackets are the general shape of a residual block, including filter sizes and feature dimensionalities. The number of stacked blocks on each stage is presented outside the brackets. “G = 128” suggests the depthwise convolution with 128 input channels. Two models have similar numbers of parameters and FLOPs. At the same time, depthwise convolutions facilitate the computation efficiency of our Deformable Kernels
Download tables as Excel
Funding
  • Argues that the awareness of deformations emerges from adaptivity – the ability to adapt at runtime
  • Shows how different 3 × 3 convolutions interact with deformations of two images
  • Introduces Deformable Kernels , a family of novel and generic convolutional operators for deformation modeling
  • Shows that DKs can work orthogonally and complementarily with previous techniques
Reference
  • Joan Bruna and Stephane Mallat. Invariant scattering convolution networks. TPAMI, 2013.
    Google ScholarLocate open access versionFindings
  • Taco Cohen and Max Welling. Group equivariant convolutional networks. In ICML, 2016.
    Google ScholarLocate open access versionFindings
  • Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. Deformable convolutional networks. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Carlos Esteves, Christine Allen-Blanchette, Xiaowei Zhou, and Kostas Daniilidis. Polar transformer networks. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • James J Gibson. The perception of the visual world. Houghton Mifflin, 1950.
    Google ScholarFindings
  • Priya Goyal, Piotr Dollar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
    Findings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Drew A Hudson and Christopher D Manning. Learning by abstraction: The neural state machine. arXiv preprint arXiv:1907.03950, 2019.
    Findings
  • Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial transformer networks. In NeurIPS, 2015.
    Google ScholarLocate open access versionFindings
  • Xu Jia, Bert De Brabandere, Tinne Tuytelaars, and Luc V Gool. Dynamic filter networks. In NeurIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Angjoo Kanazawa, Abhishek Sharma, and David W. Jacobs. Locally scale-invariant convolutional neural networks. In NeurIPS Workshop, 2016.
    Google ScholarLocate open access versionFindings
  • Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006.
    Google ScholarLocate open access versionFindings
  • Xiang Li, Wenhai Wang, Xiaolin Hu, and Jian Yang. Selective kernel networks. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • David G Lowe et al. Object recognition from local scale-invariant features. In ICCV, 1999.
    Google ScholarLocate open access versionFindings
  • Wenjie Luo, Yujia Li, Raquel Urtasun, and Richard Zemel. Understanding the effective receptive field in deep convolutional neural networks. In NeurIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 2008.
    Google ScholarLocate open access versionFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, 2015.
    Google ScholarLocate open access versionFindings
  • Ignacio Rocco, Relja Arandjelovic, and Josef Sivic. Convolutional neural network architecture for geometric matching. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Evan Shelhamer, Dequan Wang, and Trevor Darrell. Blurring the line between structure and learning to optimize and adapt receptive fields. arXiv preprint arXiv:1904.11487, 2019.
    Findings
  • Laurent Sifre and Stephane Mallat. Rotation, scaling and deformation invariant scattering for texture discrimination. In CVPR, 2013.
    Google ScholarFindings
  • Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Francois Goulette, and Leonidas J Guibas. Kpconv: Flexible and deformable convolution for point clouds. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Dequan Wang, Evan Shelhamer, Bruno Olshausen, and Trevor Darrell. Dynamic scale inference by entropy optimization. arXiv preprint arXiv:1908.03182, 2019.
    Findings
  • Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Daniel E Worrall, Stephan J Garbin, Daniyar Turmukhambetov, and Gabriel J Brostow. Harmonic networks: Deep translation and rotation equivariance. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Yuwen Xiong, Mengye Ren, Renjie Liao, Kelvin Wong, and Raquel Urtasun. Deformable filter convolution for point cloud reasoning. arXiv preprint arXiv:1907.13079, 2019.
    Findings
  • Brandon Yang, Gabriel Bender, Quoc V Le, and Jiquan Ngiam. Soft conditional computation. arXiv preprint arXiv:1904.04971, 2019.
    Findings
  • Richard Zhang. Making convolutional networks shift-invariant again. In ICML, 2019. Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. Deformable convnets v2: More deformable, better results. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments