Enhancing Model Parallelism in Neural Architecture Search for Multidevice System

IEEE Micro(2020)

引用 3|浏览80
暂无评分
摘要
Neural architecture search (NAS) finds favorable network topologies for better task performance. Existing hardware-aware NAS techniques only target to reduce inference latency on single CPU/GPU systems and the searched model can hardly be parallelized. To address this issue, we propose ColocNAS, the first synchronization-aware, end-to-end NAS framework that automates the design of parallelizable neural networks for multidevice systems while maintaining a high task accuracy. ColocNAS defines a new search space with elaborated connectivity to reduce device communication and synchronization. ColocNAS consists of three phases: 1) offline latency profiling that constructs a lookup table of inference latency of various networks for online runtime approximation; 2) differentiable latency-aware NAS that simultaneously minimizes inference latency and task error; and 3) reinforcement-learning-based device placement fine-tuning to further reduce the latency of the deployed model. Extensive evaluation corroborates ColocNAS's effectiveness to reduce inference latency while preserving task accuracy.
更多
查看译文
关键词
model parallelism,neural architecture search,multidevice system,favorable network topologies,task performance,existing hardware-aware NAS techniques,searched model,synchronization-aware,end-to-end NAS framework,parallelizable neural networks,high task accuracy,search space,device communication,latency-aware NAS,simultaneously minimizes inference latency,task error,deployed model,ColocNAS's effectiveness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要