AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Our work searches architectures directly for object detection, and the search is guided by simulated signals of on-device latency

MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

CVPR, pp.13604-13613, (2020)

Cited by: 16|Views191
EI

Abstract

Despite the blooming success of architecture search for vision tasks in resource-constrained environments, the design of on-device object detection architectures have mostly been manual. The few automated search efforts are either centered around non-mobile-friendly search spaces or not guided by on-device latency. We propose Mnasfpn, a...More

Code:

Data:

0
Introduction
  • Designing neural network architectures for efficient deployment on mobile devices is not an easy task: one has to judiciously trade off the amount of computation with accuracy, while taking into consideration the set of operations that are supported and favored by the devices.
  • Despite the significant advances on NAS for image classification both in the server setting [33, 25] and in the mobile setting [24, 3, 9, 28, 6], relatively fewer attempts [7, 4, 26] focus on object detection
  • This is in part because the additional complexity in the search space of the detection head relative to the backbone.
  • This is a challenging task that few NAS frameworks have demonstrated the ability to handle
Highlights
  • Designing neural network architectures for efficient deployment on mobile devices is not an easy task: one has to judiciously trade off the amount of computation with accuracy, while taking into consideration the set of operations that are supported and favored by the devices
  • As the connectivity gets thinner, we found that it’s helpful to augment the flow of information by adding cell-wide residuals between every output feature and the input feature of matching sizes, a design that is reminiscent of the residual connections in Inverted Residual Block
  • Object detection should be treated as a first-class citizen in NAS
  • Our work searches architectures directly for object detection, and the search is guided by simulated signals of on-device latency
  • We have proposed the MnasFPN search space with two innovations
  • MnasFPN incorporates inverted residual blocks into the detection head, which is proven to be favored on mobile CPUs
Methods
  • The authors present experimental results to showcase the effectiveness of the proposed MnasFPN search space.
  • The authors report results on COCO object detection.
  • The authors added ablation studies to isolate the effectiveness of every component of the search space design as well as latency-aware search
Results
  • Note that this balancing act is jammed within less than 40% of the total budget
Conclusion
  • Object detection should be treated as a first-class citizen in NAS.
  • The search process, and more importantly, the search space should both be designed to incorporate knowledge about the targeted platform.
  • The authors' work searches architectures directly for object detection, and the search is guided by simulated signals of on-device latency.
  • MnasFPN incorporates inverted residual blocks into the detection head, which is proven to be favored on mobile CPUs. Second, MnasFPN restructured the reshaping and convolution operations in the head to facilitate efficient merging of information across scales
Tables
  • Table1: Search space comparisons. The common search parameters such as merge operations, feature resolutions and connectivity are omitted. See Appendix for detailed search space cardinality calculations
  • Table2: Ablation study of SDO. SDO does not affects parameters that much but reduces both MAdds and latency
  • Table3: MnasFPN variations compared with other mobile detection models on COCO test-dev. Latency numbers with ‘*‘ are remeasured in the same configuration (same benchmarker binary and same device) as MnasFPN models to ensure fairness of comparison. Models with † employs the channel-halving trick [<a class="ref-link" id="c9" href="#r9">9</a>]. Models with ‡ was obtained with a depth multiplier of 0.7 on both head and backbone
Download tables as Excel
Related work
  • 2.1. Mobile Object Detection Models

    The most common detection models on mobile devices are manually designed by experts. Among them are singleshot detectors such as YOLO [21], SqueezeDet [29], and Pelee [27] as well as two-stage detectors, such as Faster RCNN [22], R-FCN [5], and ThunderNet [20].

    SSDLite [23] is the most popular light-weight detection head architecture. It replaces the expensive 3×3 full convolutions in the SSD head [17] with separable convolutions to reduce computational burden on mobile devices. This technique is also employed by NAS-FPNLite [7] to adapt NASFPN to mobile devices. SSDLite and NAS-FPNLite are paired with efficient backbones such as MobileNetV3 [9] to produce state-of-the-art mobile detectors. Since we design mobile-friendly detection heads, both SSDLite and NASFPNLite are crucial baselines to showcase our effectiveness.
Funding
  • Proposes MnasFPN, a mobile-friendly search space for the detection head, and combine it with latency-aware architecture search to produce efficient object detection models
  • Proposes a search space called MnasFPN, which is designed for mobile devices where depthwise convolutions are reasonably optimized
  • Found that it’s helpful to augment the flow of information by adding cell-wide residuals between every output feature and the input feature of matching sizes, a design that is reminiscent of the residual connections in IRB
  • Has explored Squeeze-Excite , but much to our surprise, it was not chosen by our NAS controller for topperforming candidates
Reference
  • Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning, pages 549–558, 2018. 2
    Google ScholarLocate open access versionFindings
  • Han Cai, Chuang Gan, and Song Han. Once for all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791, 2019. 2
    Findings
  • Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332, 2018. 1, 2
    Findings
  • Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Chunhong Pan, and Jian Sun. Detnas: Neural architecture search on object detection. arXiv preprint arXiv:1903.10979, 2019. 1, 2
    Findings
  • Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. R-fcn: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems, pages 379–387, 2016. 2
    Google ScholarLocate open access versionFindings
  • Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, et al. Chamnet: Towards efficient network design through platform-aware model adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11398–11407, 2019. 1
    Google ScholarLocate open access versionFindings
  • Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7036–7045, 2019. 1, 2, 5, 8
    Google ScholarLocate open access versionFindings
  • Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), pages 784– 800, 2012
    Google ScholarLocate open access versionFindings
  • Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. arXiv preprint arXiv:1905.02244, 2011, 2, 8
    Findings
  • Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017. 2
    Findings
  • Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018. 3
    Google ScholarLocate open access versionFindings
  • Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, et al. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7310–7311, 2017. 8
    Google ScholarLocate open access versionFindings
  • Alexander Kirillov, Ross Girshick, Kaiming He, and Piotr Dollar. Panoptic feature pyramid networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6399–6408, 2019. 3
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017. 2
    Google ScholarLocate open access versionFindings
  • Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan L Yuille, and Li Fei-Fei. Autodeeplab: Hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 82–92, 2019. 2
    Google ScholarLocate open access versionFindings
  • Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018. 2
    Findings
  • Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016. 2
    Google ScholarLocate open access versionFindings
  • Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016. 6
    Findings
  • Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268, 2018. 2
    Findings
  • Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, and Jian Sun. Thundernet: Towards real-time generic object detection. arXiv preprint arXiv:1903.11752, 2019. 2
    Findings
  • Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. 2
    Google ScholarLocate open access versionFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015. 2
    Google ScholarLocate open access versionFindings
  • Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4510–4520, 2018. 1, 2, 6, 8
    Google ScholarLocate open access versionFindings
  • Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. Mnasnet: Platform-aware neural architecture search for mobile. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 1, 2, 4, 6, 8
    Google ScholarLocate open access versionFindings
  • Mingxing Tan and Quoc V Le. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946, 2019. 1
    Findings
  • Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, and Chunhua Shen. Nas-fcos: Fast neural architecture search for object detection. arXiv preprint arXiv:1906.04423, 2019. 1, 2
    Findings
  • Robert J Wang, Xiang Li, and Charles X Ling. Pelee: A real-time object detection system on mobile devices. In Advances in Neural Information Processing Systems, pages 1963–1972, 2018. 2
    Google ScholarLocate open access versionFindings
  • Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10734–10742, 2019. 1, 2
    Google ScholarLocate open access versionFindings
  • Bichen Wu, Forrest Iandola, Peter H Jin, and Kurt Keutzer. Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 129–137, 2017. 2
    Google ScholarLocate open access versionFindings
  • Saining Xie, Alexander Kirillov, Ross Girshick, and Kaiming He. Exploring randomly wired neural networks for image recognition. arXiv preprint arXiv:1904.01569, 2019. 5
    Findings
  • Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. Netadapt: Platform-aware neural network adaptation for mobile applications. In Proceedings of the European Conference on Computer Vision (ECCV), pages 285–300, 2018. 2, 4
    Google ScholarLocate open access versionFindings
  • Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6848–6856, 2018. 2
    Google ScholarLocate open access versionFindings
  • Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016. 1, 2, 7
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科