Automatically Tuning the GCC Compiler to Optimize the Performance of Applications Running on the ARM Cortex-M3.

arXiv: Distributed, Parallel, and Cluster Computing（2017）

引用 25|浏览14

暂无评分

摘要

This paper introduces a novel method for automatically tuning the selection of compiler flags in order to optimize the performance of software that is intended to run on particular hardware platforms. Motivated by the rapid recent expansion of so-called Internet of Things (IoT) devices, we are particularly interested in improving the execution time of code running on embedded system architectures. We demonstrate the effectiveness of our approach on code compiled by the GNU C Compiler (GCC) for the ARM Cortex-M3 (CM3) processor; and we show how our method outperforms the current industry standard -O3 optimization level across a diverse embedded system benchmark suite called BEEBS. We begin by conducting an investigatory study to quantify the potential gains by using existing iterative compilation approaches that time-intensively search for optimal configurations for each given benchmark. Then we adapt iterative compilation to output a single configuration that optimizes performance across the entire benchmark suite as a whole. Although this is a time consuming process, our approach eventually constructs a simple variation of -O3, which we call -Ocm3, that realizes nearly two thirds of the known available gains on the CM3 architecture (beyond -O3) and significantly outperforms a far more complex state-of-the-art predictive method. Our approach suggests that 26 flags should be removed from -O3 while three other flags should be added. We analyze in detail two of the former and explain why turning them off improves performance on this processor.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要