AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose to extend attention modules from the first- or second-order to a higher order, i.e., arranging more basic attention units structurally

Auto Learning Attention

NIPS 2020, (2020)

Cited by: 18|Views62
EI
Full Text
Bibtex
Weibo

Abstract

Attention modules have been demonstrated effective in strengthening the representation ability of a neural network via reweighting spatial or channel features or stacking both operations sequentially. However, designing the structures of different attention operations requires a bulk of computation and extensive expertise. In this paper, ...More

Code:

Data:

0
Introduction
  • Attention learning has been increasingly incorporated into convolutional neural networks (CNNs) [1], aiming to compact the image representation and strengthen its discriminatory power [2, 3, 4, 5].
  • Let either the channel attention or spatial attention be treated as the first-order attention
  • The combination of both channel attention and spatial attention constitutes the second-order attention, which has been proven in benchmarks to produce better performance than either first-order attention by modulating the feature maps in both channel-wise and spatial-wise [4].
  • Considering the highly variable structures and hyperparameters of basic attention units, exhaustively searching the architecture of high order attention module is an exponential explosion in complexity
Highlights
  • Attention learning has been increasingly incorporated into convolutional neural networks (CNNs) [1], aiming to compact the image representation and strengthen its discriminatory power [2, 3, 4, 5]
  • The results are summarized in Table 1 and Table 2, which demonstrate the searched high order group attention (HOGA) attention outperforms other attention baselines with slightly more computations
  • From Table 1, Table 2, and Table 3, we can see that the HOGA searched by Auto Learning Attention (AutoLA) outperforms other attention modules on CIFAR10 when deployed on highly variable architectures including ResNet, ResNeXt, and PNAS, indicating the consistent superiority of the HOGA searched by AutoLA over previous attention methods
  • We evaluate the usefulness of the searched HOGA module for object detection in this part
  • We present the first attempt to search efficient and effective plug-and-play high order attention modules for various well-established backbone networks
  • We propose a new attention module named high order group attention and search its explicit architecture via a differential method efficiently
Methods
  • 4.1 Datasets

    Four benchmark datasets, including CIFAR10 [38], CIFAR100 [38], ImageNet ILSVRC2012 [39], and COCO [40], are used for this study.

    4.2 Experiment Setup

    HOGA is a general module which can be integrated into any well-established CNN architectures and is end-to-end trainable along with the backbone.
  • Taking ResNet20 [1] as an exemplar backbone network, where the base number of channel is 16, the authors search the best architecture of the attention module on it and transfer the searched attention module to ResNet-32 and ResNet-56 for the evaluation on CIFAR10 and CIFAR100.
  • To evaluate the capabilities of image classification on larger datasets, the authors transfer the searched attention module to ResNet-18, ResNet-34, ResNet-50, ResNet101 [1], and WiderResNet [41] and train them on ImageNet. When testing on CIFAR100 and ImageNet, the base channel number of the network is set to 64.
Results
  • Image Classification Results on

    CIFAR10 and CIFAR100

    In the evaluation stage on CIFAR10, the entire training set is used, and the network is trained from scratch for 500 epochs with a batch size of 256.
  • The authors perform image classification on the ImageNet dataset to evaluate Table 4: Comparison of different attentions on ImageNet the searched HOGA module for this more challenging task.
  • The authors choose the popular object detection framework named Single-Shot Detector (SSD) [46] and a popular two-stage framework Faster RCNN [47] + FPN [48] use ResNet50 with different attentions (e.g., SE, CBAM, and HOGA) pretrained on ImageNet dataset as the backbone networks.
  • More implementation details can be found in the supplementary material.
Conclusion
  • The authors present the first attempt to search efficient and effective plug-and-play high order attention modules for various well-established backbone networks.
  • The searched attention module generalizes well on various backbones and outperforms manually designed attentions on many typical computer vision tasks.
  • The authors will formulate the backbone and attention architecture into a unified framework and search their optimal architectures in an alternative or synchronous manner
Tables
  • Table1: Comparison of different attention
  • Table2: Comparison of different attention modules on CIFAR10
  • Table3: Comparison of different attention modules on ResNeXt and PNAS
  • Table4: Comparison of different attentions on ImageNet the searched HOGA module for this more challenging task. We adopt the
  • Table5: Results of other attentions modules on ImageNet
  • Table6: Experiments with fair settings of parameters and FLOPGs and ablation study results on CIFAR10
  • Table7: Comparison of object detection results on COCO
  • Table8: Human keypoint detection results perform better than SE for this task. We suspect that since the input of the keypoint detection model is a cropped and re-scaled person detection region where the human body is salient, therefore, the spatial attention may not benefit the model more given the channel
Download tables as Excel
Related work
  • Attention mechanism. The attention mechanism was originally introduced in neural machine translation to handle long-range dependencies [15], which enables the model to attend to important regions within a context adaptively. Self-attention was added to CNNs by either using channel attention or non-local relationships across the image [2, 3, 16, 17, 18]. As different feature channels encode different semantic concepts, the squeeze-and-excitation (SE) attention captures channel correlations by selectively modulating the scale of channels [2, 19]. Spatial attention is also explored together with the channel attention in [4], resulting in a second-order attention module called CBAM and achieving superior performance. In [19, 20], the attention is extended to multiple independent branches which achieves improved performance than the original one. In contrast to these handcrafted attention modules, we define the high order group attention and construct the search space accordingly where SE [2] and CBAM [4] are special instances in it. Consequently, a more effective attention module can be searched automatically, outperforming both SE [2] and CBAM [4] on various vision tasks.
Funding
  • This work was supported by the the National Natural Science Foundation of China under grants 61771397, China Scholarship Council, Science and Technology Innovation Committee of Shenzhen Municipality under Grant JCYJ20180306171334997 and Australian Research Council Project FL-170100117
Study subjects and analysis
benchmark datasets: 4
Code is available at https://github.com/btma48/AutoLA. 4.1 Datasets

Four benchmark datasets, including CIFAR10 [38], CIFAR100 [38], ImageNet ILSVRC2012 [39], and COCO [40], are used for this study.

4.2 Experiment Setup

HOGA is a general module which can be integrated into any well-established CNN architectures and is end-to-end trainable along with the backbone
. Taking ResNet20 [1] as an exemplar backbone network, where the base number of channel (width) is 16, we search the best architecture of the attention module on it and then transfer the searched attention module to ResNet-32 and ResNet-56 for the evaluation on CIFAR10 and CIFAR100

benchmark datasets: 4
4.1 Datasets. Four benchmark datasets, including CIFAR10 [38], CIFAR100 [38], ImageNet ILSVRC2012 [39], and COCO [40], are used for this study. 4.2 Experiment Setup

Reference
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Andrea Vedaldi. Gather-excite: Exploiting feature context in convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In European Conference on Computer Vision (ECCV), 2018.
    Google ScholarLocate open access versionFindings
  • Jongchan Park, Sanghyun Woo, Joon-Young Lee, and In-So Kweon. Bam: Bottleneck attention module. In British Machine Vision Conference (BMVC), 2018.
    Google ScholarLocate open access versionFindings
  • Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan L Yuille, and Li Fei-Fei. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, and Chunhua Shen. Nas-fcos: Fast neural architecture search for object detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Ruijie Quan, Xuanyi Dong, Yu Wu, Linchao Zhu, and Yi Yang. Auto-reid: Searching for a part-aware convnet for person re-identification. In IEEE International Conference on Computer Vision (ICCV), 2019.
    Google ScholarLocate open access versionFindings
  • Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, and Piotr Dollár. On network design spaces for visual recognition. In IEEE International Conference on Computer Vision (ICCV), 2019.
    Google ScholarLocate open access versionFindings
  • Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. Efficient neural architecture search via parameters sharing. International Conference on Machine Learning (ICML), 2018.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, and Jonathon Shlens. Stand-alone self-attention in vision models. Advances in Neural Information Processing Systems (NIPS), 2019.
    Google ScholarLocate open access versionFindings
  • Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, and Quoc V Le. Attention augmented convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Hengshuang Zhao, Jiaya Jia, and Vladlen Koltun. Exploring self-attention for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
    Google ScholarLocate open access versionFindings
  • Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Mueller, R Manmatha, et al. Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955, 2020.
    Findings
  • Binghui Chen, Weihong Deng, and Jiani Hu. Mixed high-order attention network for person re-identification. In IEEE International Conference on Computer Vision (ICCV), 2019.
    Google ScholarLocate open access versionFindings
  • Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. Designing neural network architectures using reinforcement learning. International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and Cheng-Lin Liu. Practical block-wise neural network architecture generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In European Conference on Computer Vision (ECCV), 2018.
    Google ScholarLocate open access versionFindings
  • Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V Le, and Alexey Kurakin. Large-scale evolution of image classifiers. In International Conference on Machine Learning (ICML), 2017.
    Google ScholarLocate open access versionFindings
  • Lingxi Xie and Alan Yuille. Genetic cnn. In IEEE International Conference on Computer Vision (ICCV), 2017.
    Google ScholarLocate open access versionFindings
  • David R So, Chen Liang, and Quoc V Le. The evolved transformer. International Conference on Machine Learning (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • Saining Xie, Alexander Kirillov, Ross Girshick, and Kaiming He. Exploring randomly wired neural networks for image recognition. In IEEE International Conference on Computer Vision (ICCV), 2019.
    Google ScholarLocate open access versionFindings
  • Liam Li and Ameet Talwalkar. Random search and reproducibility for neural architecture search. Conference on Uncertainty in Artificial Intelligence (UAI), 2019.
    Google ScholarLocate open access versionFindings
  • Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu. Hierarchical representations for efficient architecture search. International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. Neural architecture optimization. In Advances in Neural Information Processing Systems (NIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, and Diana Marculescu. Single-path nas: Designing hardware-efficient convnets in less than 4 hours. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECMLPKDD), 2019.
    Google ScholarLocate open access versionFindings
  • Xiangxiang Chu, Bo Zhang, Ruijun Xu, and Jixiang Li. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. arXiv preprint arXiv:1907.01845, 2019.
    Findings
  • Chaoyang He, Haishan Ye, Li Shen, and Tong Zhang. Milenas: Efficient neural architecture search via mixed-level reformulation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
    Google ScholarLocate open access versionFindings
  • Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In IEEE International Conference on Computer Vision (ICCV), 2019.
    Google ScholarLocate open access versionFindings
  • Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. Pc-darts: Partial channel connections for memory-efficient differentiable architecture search. International Conference on Learning Representations (ICLR), 2020.
    Google ScholarFindings
  • Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, and Frank Hutter. Understanding and robustifying differentiable architecture search. In International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning (ICML), 2018.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Citeseer, Tech. Rep, 2009.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV), 2014.
    Google ScholarLocate open access versionFindings
  • S Zagoruyko and N Komodakis. Wide residual networks. British Machine Vision Conference (BMVC), 2016.
    Google ScholarLocate open access versionFindings
  • Fengxiang He, Bohan Wang, and Dacheng Tao. Piecewise linear activations substantially shape the loss surfaces of neural networks. International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • Fengxiang He, Tongliang Liu, and Dacheng Tao. Control batch size and learning rate to generalize well: Theoretical and empirical evidence. 2019.
    Google ScholarFindings
  • Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1492–1500, 2017.
    Google ScholarLocate open access versionFindings
  • Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In IEEE International Conference on Computer Vision Workshops, 2019.
    Google ScholarLocate open access versionFindings
  • Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In European Conference on Computer Vision (ECCV), 2016.
    Google ScholarLocate open access versionFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
    Google ScholarLocate open access versionFindings
  • Zhe Chen, Jing Zhang, and Dacheng Tao. Recursive context routing for object detection. International Journal of Computer Vision, pages 1–19, 2020.
    Google ScholarLocate open access versionFindings
  • Bin Xiao, Haiping Wu, and Yichen Wei. Simple baselines for human pose estimation and tracking. In European Conference on Computer Vision (ECCV), 2018.
    Google ScholarLocate open access versionFindings
  • Jing Zhang, Zhe Chen, and Dacheng Tao. Towards high performance human keypoint detection. arXiv preprint arXiv:2002.00537, 2020.
    Findings
  • Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In IEEE International Conference on Computer Vision (ICCV), 2017.
    Google ScholarLocate open access versionFindings
  • Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • Ari S Morcos, David GT Barrett, Neil C Rabinowitz, and Matthew Botvinick. On the importance of single directions for generalization. International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Laurens Van Der Maaten. Barnes-hut-sne. International Conference on Learning Representations (ICLR), 2013.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科