MAPLE: Microprocessor A Priori for Latency Estimation

Saad Abbasi,Alexander Wong,Mohammad Javad Shafiee

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2022)（2021）

引用 10|浏览37

暂无评分

摘要

Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. As such, neural architecture search (NAS) algorithms take these two constraints into account when generating a new architecture. However, efficiency metrics such as latency are typically hardware dependent requiring the NAS algorithm to either measure or predict the architecture latency. Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. Here we propose Microprocessor A Priori for Latency Estimation (MAPLE) that leverages hardware characteristics to predict deep neural network latency on previously unseen hardware devices. MAPLE takes advantage of a novel quantitative strategy to characterize the underlying microprocessor by measuring relevant hardware performance metrics, yielding a fine-grained and expressive hardware descriptor. The CPU-specific performance metrics are also able to characterize GPUs, resulting in a versatile descriptor that does not rely on the availability of hardware counters on GPUs or other deep learning accelerators. We provide experimental insight into this novel strategy. Through this hardware descriptor, MAPLE can generalize to new hardware via a few shot adaptation strategy, requiring as few as 3 samples from the target hardware to yield 6% improvement over state-of-the-art methods requiring as much as 10 samples. Experimental results showed that, increasing the few shot adaptation samples to 10 improves the accuracy significantly over the state-of-the-art methods by 12%. We also demonstrate MAPLE identification of Pareto-optimal DNN architectures exhibit superlative accuracy and efficiency. The proposed technique provides a versatile and practical latency prediction methodology for DNN run-time inference on multiple hardware devices while not imposing any significant overhead for sample collection.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要