35.3 Thread-Level Power Management for a Current- and Temperature-Limiting System in a 7nm Hexagon™ Processor

2021 IEEE International Solid-State Circuits Conference (ISSCC)(2021)

引用 0|浏览2
暂无评分
摘要
The Hexagon™ compute DSP (CDSP) integrates a master VLIW scalar processor and a slave vector coprocessor to enable high-performance and energy-efficient computing for multimedia, voice, audio, vision, imaging, and machine-learning (ML) applications [1]. The master processor executes scalar instruction packets and issues vector instruction packets to the slave coprocessor. The vector coprocessor executes wide-data arithmetic and memory operations for significant processing at the cost of high power. The power delivery for a mobile system-on-chip (SoC) processor consists of a battery that drives a PMIC to generate the SoC supply voltage (V DD ) rails. The PMIC voltage regulator (VR) supplies V DD while operating below a peak-current specification (spec). If the CDSP exceeds the peak-current spec for a sustained duration, then the battery and/or PMIC VR may incur a brownout condition where V DD degrades, resulting in circuit failures. Thus, the CDSP requires a current-limiting system to prevent brownout. The latency requirement to detect the current exceeding the peak-current spec and then to respond by operating at a lower current is ~1μs. Also, the SoC must operate within a target thermal design power and temperature with detection and response latencies in 100's of μs and 10's of ms, respectively. Prior current- and temperature-limiting systems lower the phase-locked loop (PLL) clock frequency (F CLK ) or reduce V DD and (F CLK ) in response to exceeding current or temperature specs. Although these techniques are effective for response latencies above ~10μs, the time to reduce the PLL F CLk or V DD and F CLK far exceeds the ~10μs latency spec. Alternative approaches for satisfying the ~1μs latency target include integrating an adaptive clocking circuit after the PLL [4] or throttling the instruction-issue rate [5] to quickly change performance. These designs satisfy the latency spec by globally reducing performance without considering individual thread power or priority. This paper describes a thread-level power management (TPM) design to adapt the instruction-issue rate based on individual thread power and priority for a current- and temperature-limiting system in a 7nm [6] Hexagon CDSP. The TPM exploits low-power phases during thread execution to adjust the thread instruction-issue rate to achieve a higher performance at a target power as compared to global throttling.
更多
查看译文
关键词
Hexagon compute DSP,master VLIW scalar processor,slave vector coprocessor,energy-efficient computing,slave coprocessor,power delivery,system-on-chip processor,PMIC voltage regulator,peak-current specification,PMIC VR,response latencies,temperature-limiting systems,phase-locked loop clock frequency,thread-level power management design,Hexagon CDSP,low-power phases,thread execution,thread instruction-issue rate,7nm Hexagon processor,current-temperature-limiting system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要