AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We proposed Once for All, a new methodology that decouples model training from architecture search for efficient deep learning deployment under a large number of deployment scenarios

Once for All: Train One Network and Specialize it for Efficient Deployment

international conference on learning representations, (2020)

Cited by: 30|Views532
Full Text
Bibtex
Weibo

Abstract

We address the challenging problem of efficient deep learning model deployment across many devices, where the goal is to design neural network architectures that can fit diverse hardware platform constraints: from the cloud to the edge. Most of the traditional approaches either manually design or use neural architecture search (NAS) to fi...More
Introduction
  • Deep Neural Networks (DNNs) deliver state-of-the-art accuracy in many machine learning applications.
  • Designing specialized DNNs for every scenario is engineer-expensive and computationally expensive, either with human-based methods or NAS.
  • Since such methods need to repeat the network design process and retrain the designed network from scratch for each case.
  • It makes them unable to handle the vast amount of hardware devices (23.14 billion IoT devices till 20181) and highly dynamic deployment environments
Highlights
  • Deep Neural Networks (DNNs) deliver state-of-the-art accuracy in many machine learning applications
  • This paper introduces a new solution to tackle this challenge – designing a once-for-all network that can be directly deployed under diverse architectural configurations, amortizing the training cost
  • once-for-all network consistently improves the trade-off between accuracy and latency by a significant margin, especially on GPUs which have more parallelism
  • It reveals the insight that using the same model for different deployment scenarios with only the width multiplier modified has a limited impact on efficiency improvement: the accuracy drops quickly as the latency constraint gets tighter
  • We proposed Once for All (OFA), a new methodology that decouples model training from architecture search for efficient deep learning deployment under a large number of deployment scenarios
  • To prevent sub-networks of different sizes from interference, we proposed a progressive shrinking algorithm that enables a large number of sub-network to achieve the same level of accuracy compared to training them independently
Methods
  • Wo archi where C(Wo, archi) denotes a selection scheme that selects part of the model from the once-for-all network Wo to form a sub-network with architectural configuration archi.
  • The overall training objective is to optimize Wo to make each supported sub-network maintain the same level of accuracy as independently training a network with the same architectural configuration.
  • Each unit consists of a sequence of layers where only the first layer has stride 2 if the feature map size decreases (Sandler et al, 2018).
Results
  • Figure 5 summarizes the results of OFA under different FLOPs and Pixel1 latency constraints.
  • Figure 6 reports detailed comparisons between OFA and MobileNetV3 on 6 mobile devices.
  • OFA can produce the entire trade-off curves with many points over a wide range of latency constraints by training only once.
  • It reveals the insight that using the same model for different deployment scenarios with only the width multiplier modified has a limited impact on efficiency improvement: the accuracy drops quickly as the latency constraint gets tighter.
  • The profiling results are summarized in Figure 8, while the roofline models (Williams et al, 2009) on
Conclusion
  • We proposed Once for All (OFA), a new methodology that decouples model training from architecture search for efficient deep learning deployment under a large number of deployment scenarios.
  • Unlike previous approaches that design and train a neural network for each deployment scenario, we (a) on Xilinx ZU9EG FPGA (b) on Xilinx ZU3EG FPGA.
  • ZU3EG 4.1mZUs3(REG= 146.14m) s (R = 164) ZU3EG 4.1ms (R = 164).
  • CCooCnonvnvv33x3x3x33 MMMBB11B133x3x3x33 MMB4B43x33x3 MB4 3x3 MMB4B43x33x3 MB4 3x3 MMB4B43x33x3 MB4 3x3 MMB5B53x33x3 MB5 3x3 MMB5B53x33x3 MB5 3x3 MMB5B53x33x3 MB5 3x3 MMB5B53x33x3 MB5 3x3 MMB4B43x33x3 MB4 3x3 MMB5B53x33x3 MB5 3x3 MMB6B63x33x3 MB6 3x3 MMB5B53x33x3 MMMBB65B633x3x3x33 PPoMooBlio6lni3gnxgF3FCC Pooling FC
Tables
  • Table1: ImageNet top1 accuracy (%) performances of sub-networks under resolution 224 × 224. “(D = d, W = w, K = k)” denotes a sub-network with d layers in each unit, and each layer has an width expansion ratio w and kernel size k. “Mbv3-L” denotes “MobileNetV3-Large”
  • Table2: Compariso1n43.0with SOT80A.10 hardware-aware NAS methods375on Pixel179p.8 hone. OFA decouples model training from architecture search. The search cost and training cost both stay constant as the number of deployment scenarios grows. “#25” denotes the specialized sub-networks are fine-tuned for 25 epochs after grabbing weights from the once-for-all network. “CO2e” denotes CO2 emission which is calculated based on <a class="ref-link" id="cStrubell_et+al_2019_a" href="#rStrubell_et+al_2019_a">Strubell et al (2019</a>). AWS cost is calculated based on the price of on-demand P3.16xlarge instances
Download tables as Excel
Related work
Funding
  • We thank NSF Career Award #1943349, MIT-IBM Watson AI Lab, Google-Daydream Research Award, Samsung, Intel, Xilinx, SONY, AWS Machine Learning Research Award for supporting this research
Study subjects and analysis
data: 40
Training cost (GPU hours). Total cost (N = 40) GPU hours CO2e (lbs) AWS cost. 72.0 300M 66ms

Reference
  • Anubhav Ashok, Nicholas Rhinehart, Fares Beainy, and Kris M Kitani. N2n learning: Network to network compression via policy gradient reinforcement learning. In ICLR, 2018. 4
    Google ScholarLocate open access versionFindings
  • Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. Efficient architecture search by network transformation. In AAAI, 2018a. 3
    Google ScholarLocate open access versionFindings
  • Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. Path-level network transformation for efficient architecture search. In ICML, 2018b. 3
    Google ScholarLocate open access versionFindings
  • Han Cai, Ligeng Zhu, and Song Han. ProxylessNAS: Direct neural architecture search on target task and hardware. In ICLR, 2019. URL https://arxiv.org/pdf/1812.00332.pdf.3, 5, 6, 7, 8
    Findings
  • Brian Cheung, Alex Terekhov, Yubei Chen, Pulkit Agrawal, and Bruno Olshausen. Superposition of many models into one. In NeurIPS, 2019. 4
    Google ScholarFindings
  • Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In NeurIPS, 2015. 3
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. 6
    Google ScholarLocate open access versionFindings
  • Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. Single path one-shot neural architecture search with uniform sampling. arXiv preprint arXiv:1904.00420, 2019. 7
    Findings
  • Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In NeurIPS, 2015. 3
    Google ScholarLocate open access versionFindings
  • Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR, 2016. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016. 3
    Google ScholarLocate open access versionFindings
  • Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. Amc: Automl for model compression and acceleration on mobile devices. In ECCV, 2018. 1, 3
    Google ScholarLocate open access versionFindings
  • Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. 4
    Findings
  • Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. In ICCV 2019, 2019. 6, 7
    Google ScholarLocate open access versionFindings
  • Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017. 1, 2
    Findings
  • Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In CVPR, 2017. 3
    Google ScholarLocate open access versionFindings
  • Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Q Weinberger. Multi-scale dense networks for resource efficient image classification. In ICLR, 2018. 3
    Google ScholarLocate open access versionFindings
  • Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016. 2
    Findings
  • Jason Kuen, Xiangfei Kong, Zhe Lin, Gang Wang, Jianxiong Yin, Simon See, and Yap-Peng Tan. Stochastic downsampling for cost-adjustable inference and improved regularization in convolutional networks. In CVPR, 2018. 3
    Google ScholarLocate open access versionFindings
  • Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. Runtime neural pruning. In NeurIPS, 2017. 3
    Google ScholarLocate open access versionFindings
  • Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In ECCV, 2018. 2
    Google ScholarLocate open access versionFindings
  • Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. In ICLR, 2019. 3, 7
    Google ScholarLocate open access versionFindings
  • Lanlan Liu and Jia Deng. Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution. In AAAI, 2018. 3
    Google ScholarLocate open access versionFindings
  • Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efficient convolutional networks through network slimming. In ICCV, 2017. 3
    Google ScholarLocate open access versionFindings
  • Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016. 6
    Findings
  • Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In ECCV, 2018. 2
    Google ScholarLocate open access versionFindings
  • Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution for image classifier architecture search. In AAAI, 2019. 3, 6
    Google ScholarLocate open access versionFindings
  • Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, 2018. 1, 2, 3, 7
    Google ScholarLocate open access versionFindings
  • Emma Strubell, Ananya Ganesh, and Andrew McCallum. Energy and policy considerations for deep learning in nlp. In ACL, 2019. 1, 7
    Google ScholarLocate open access versionFindings
  • Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820–2828, 2019. 3, 7, 8
    Google ScholarLocate open access versionFindings
  • Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E Gonzalez. Skipnet: Learning dynamic routing in convolutional networks. In ECCV, 2018. 3
    Google ScholarLocate open access versionFindings
  • Samuel Williams, Andrew Waterman, and David Patterson. Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Technical report, Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States), 2009. 8
    Google ScholarFindings
  • Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In CVPR, 2019. 3, 5, 7
    Google ScholarLocate open access versionFindings
  • Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S Davis, Kristen Grauman, and Rogerio Feris. Blockdrop: Dynamic inference paths in residual networks. In CVPR, 2018. 3
    Google ScholarLocate open access versionFindings
  • Jiahui Yu and Thomas Huang. Autoslim: Towards one-shot architecture search for channel numbers. arXiv preprint arXiv:1903.11728, 2019a. 7
    Findings
  • Jiahui Yu and Thomas Huang. Universally slimmable networks and improved training techniques. In ICCV, 2019b. 3, 4
    Google ScholarLocate open access versionFindings
  • Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, and Thomas Huang. Slimmable neural networks. In ICLR, 2019. 3, 4
    Google ScholarLocate open access versionFindings
  • Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In CVPR, 2018. 1, 2
    Google ScholarLocate open access versionFindings
  • Chenzhuo Zhu, Song Han, Huizi Mao, and William J Dally. Trained ternary quantization. In ICLR, 2017. 3
    Google ScholarLocate open access versionFindings
  • Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. In ICLR, 2017. 3
    Google ScholarLocate open access versionFindings
  • Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. In CVPR, 2018. 3, 7
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科