Thread-Level Power Management For A Current- And Temperature-Limiting System In A 7nm Hexagon (Tm) Processor

2021 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC)(2021)

引用 2|浏览25
暂无评分
摘要
The Hexagon™ compute DSP (CDSP) integrates a master VLIW scalar processor and a slave vector coprocessor to enable high-performance and energy-efficient computing for multimedia, voice, audio, vision, imaging, and machine-learning (ML) applications [1]. The master processor executes scalar instruction packets and issues vector instruction packets to the slave coprocessor. The vector coprocessor executes wide-data arithmetic and memory operations for significant processing at the cost of high power. The power delivery for a mobile system-on-chip (SoC) processor consists of a battery that drives a PMIC to generate the SoC supply voltage $\\left(\\mathrm{V}_{\\mathrm{pD}}\\right)$ rails. The PMIC voltage regulator (VR) supplies V op while operating below a peak-current specification (spec). If the CDSP exceeds the peak-current spec for a sustained duration, then the battery and/or PMIC VR may incur a brownout condition where V DD degrades, resulting in circuit failures. Thus, the CDSP requires a current-limiting system to prevent brownout. The latency requirement to detect the current exceeding the peak-current spec and then to respond by operating at a lower current is $\\sim 1 \\mu \\mathrm{s}$. Also, the SoC must operate within a target thermal design power and temperature with detection and response latencies in 100’s of $\\mu \\mathrm{s}$ and 10’s of ms, respectively. Prior current- and temperature-limiting systems lower the phase-locked loop (PLL) clock frequency (F CLK ) or reduce V DD and $F_{CLK}$, [2] [3] in response to exceeding current or temperature specs. Although these techniques are effective for response latencies above $\\sim 10 \\mu \\mathrm{s}$, the time to reduce the PLL F cLk or V DD and F CLK far exceeds the $1 \\mu$ s latency spec. Alternative approaches for satisfying the $1 \\mu$ s latency target include integrating an adaptive clocking circuit after the PLL [4] or throttling the instruction-issue rate [5] to quickly change performance. These designs satisfy the latency spec by globally reducing performance without considering individual thread power or priority. This paper describes a thread-level power management (TPM) design to adapt the instruction-issue rate based on individual thread power and priority for a current- and temperature-limiting system in a 7nm [6] Hexagon CDSP. The TPM exploits low-power phases during thread execution to adjust the thread instruction-issue rate to achieve a higher performance at a target power as compared to global throttling.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要