An Adaptive Clock Scheme Exploiting Instruction-Based Dynamic Timing Slack for a GPGPU Architecture

IEEE Journal of Solid-State Circuits(2020)

引用 4|浏览36
暂无评分
摘要
This article presents an adaptive clock scheme to exploit instruction-based dynamic timing slack (DTS) for a general-purpose graphics processor unit (GPGPU) architecture. Based on the developed transitional static timing analysis, the deterministic DTS can be identified for each instruction at different pipeline stages. A critical path (CP) messenger scheme was designed to monitor the runtime utilization of CPs. Both real-time issued instruction information and CP messengers are utilized to determine the runtime DTS margin and guide the cycle-by-cycle clock period adjustment. To apply the proposed adaptive clock on GPGPU, a hierarchical clocking scheme is built including a global phase-locked loop (PLL) and local delay-locked loop (DLL)-based clock generator inside each compute unit (CU). Each CU core contains its own clock domain with adjustable local clocking. In addition, to exploit error-resilient characteristics of the neural network, an elastic pipeline clocking scheme is developed to redistribute the timing margin across pipeline stages for machine learning computations. Measurement results from the implemented open-source GPGPU architecture on a 65 nm CMOS process demonstrate up to 18% performance improvement or equivalent 30% energy saving can be obtained by exploiting the deterministic instruction-based DTS. The proposed elastic pipeline clocking can gain an additional 8% energy saving with small accuracy degradation for neural network inference operations.
更多
查看译文
关键词
Pipelines,Clocks,Runtime,Delays,Computer architecture,Phase locked loops
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要