AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We describe an alternative approach, which we call a Multiple Clock Domain processor, in which the chip is divided into several clock domains, within which independent voltage and frequency scaling can be performed

Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling

HPCA, pp.29-29, (2002)

引用535|浏览81
EI
下载 PDF 全文
引用
微博一下

摘要

As clock frequency increases and feature size decreases, clock distribution and wire delays present a growing challenge to the designers of singly-clocked, globally synchronous systems. We describe an alternative approach, which we call a Multiple Clock Domain (MCD) processor, in which the chip is divided into several (coarse-grained) clo...更多

代码

数据

0
简介
  • The continuing push for higher microprocessor performance has led to unprecedented increases in clock frequencies in recent years.
  • Due to issues of reliability and performance, wire dimensions have been scaled in successive process generations more conservatively than transistor dimensions
  • The result of these frequency and dimensional trends is that microprocessor clock speeds have become increasingly limited by wire delays, so much so that some of the more recent microprocessors, e.g., the Pentium IV [14], have pipeline stages solely dedicated to moving signals across the chip.
  • The inevitable conclusion reached by industrial researchers is that in order to continue the current pace of clock frequency increases, microprocessor designers will eventually be forced to abandon singly-clocked globally synchronous systems in favor of some form of asynchrony [8, 24]
重点内容
  • The continuing push for higher microprocessor performance has led to unprecedented increases in clock frequencies in recent years
  • The inevitable conclusion reached by industrial researchers is that in order to continue the current pace of clock frequency increases, microprocessor designers will eventually be forced to abandon singly-clocked globally synchronous systems in favor of some form of asynchrony [8, 24]
  • As we demonstrate in Section 4, these factors limit the amount of power savings that can be achieved with conventional dynamic voltage and frequency scaling
  • The baseline multiple clock domain configuration is split into four clock domains as described in Section 2 but with the frequency of all clocks statically set at 1GHz
  • The dynamic 1% and dynamic 5% configurations are identical to baseline multiple clock domain except that they support dynamic voltage and frequency scaling within each clock domain, as described in Section 3
  • The impact of misses can be seen in gcc, where the cache miss rate is high (12.5%) and the average frequency of the integer domain drops to approximately 920 MHz, but total performance degradation is less than 1%
  • We have described and evaluated a multiple clock domain (MCD) microarchitecture, which uses a globallyasynchronous, locally-synchronous (GALS) clocking style along with dynamic voltage and frequency scaling in order to maximize performance and energy efficiency for a given application
方法
  • The authors' simulation testbed is based on the SimpleScalar toolset [6] with the Wattch [5] power estimation extensions.
  • The original SimpleScalar model supports out of order execution using a centralized Register Update Unit (RUU) [29].
  • The authors have modified this structure to more closely model the microarchitecture of the Alpha 21264 microprocessor [20].
  • A summary of the simulation parameters appears in Table 1.
  • Table 2 specifies the benchmarks used along with the window of instructions simulated.
  • The authors show combined statistics for the encode and decode phases of adpcm, epic, and g721, and for the mipmap, osdemo, and texgen phases of mesa
结果
  • The authors compare the performance, energy, and energy-delay product of the MCD microarchitecture to that of a conventional singly clocked system.
  • The baseline configuration is a single clock 1GHz Alpha 21264-.
  • Like system with no dynamic voltage or frequency scaling.
  • The dynamic 1% and dynamic 5% configurations are identical to baseline MCD except that they support dynamic voltage and frequency scaling within each clock domain, as described in Section 3.
结论
  • The authors have described and evaluated a multiple clock domain (MCD) microarchitecture, which uses a globallyasynchronous, locally-synchronous (GALS) clocking style along with dynamic voltage and frequency scaling in order to maximize performance and energy efficiency for a given application.
  • By scaling frequency and voltage in different domains dynamically and independently, the authors can achieve an average improvement in energy-delay product of nearly 20%.
  • Global voltage scaling to achieve comparable performance degradation in a singly clocked microprocessor achieves an average energy-delay improvement of only 3%.
  • The authors will continue to investigate the circuit-level issues associated with being able to deliver tunable on-chip voltage and frequency with low latency
表格
  • Table1: Architectural parameters for simulated processor
  • Table2: Benchmarks
Download tables as Excel
相关工作
  • Several manufacturers, notably Intel [21] and Transmeta [16], have developed processors capable of global dynamic frequency and voltage scaling. Since minimum operational voltage is roughly proportional to frequency, and power is roughly proportional to the voltage squared, this dynamic scaling can be of major benefit in applications with real-time constraints for which the processor as a whole is over-designed: for example, video rendering. Marculescu [23] and Hsu et al [18] evaluated the use of whole-chip dynamic voltage scaling with minimal loss of performance using cache misses as the trigger [23]. Other work [7, 26] has also begun to look at steering instructions to pipelines or functional units running statically at different speeds so as to exploit scheduling slack in the program to save energy. Our contribution is to demonstrate that a microprocessor with multiple clock domains provides the opportunity to reduce power consumption on a variety of different applications without a significant performance impact by reducing frequency and voltage in domains that do not contribute significantly to the critical path of the current application phase.

    Govil et al [15] and Weiser et al [31] describe intervalbased strategies to adjust the CPU speed based on processor utilization. The goal is to reduce energy consumption by attempting to keep the processor 100% utilized without significantly delaying task completion times. A history based on the utilization in previous intervals is used to predict the amount of work and thereby adjust speed for maximum utilization without work backlog. Pering et al [25] apply a similar principle to real-time and multimedia applications. Similarly, Hughes et al [19] use instruction count predictions for frame based multimedia applications to dynamically change the global voltage and frequency of the processor while tolerating a low percentage of missed frame deadlines. Bellosa [2, 3] describes a scheme to associate energy usage patterns with every process in order to control energy consumption for the purposes of both cooling and battery life. Cache and memory behavior as well as process priorities are used as input in order to drive the energy control heuristics. Benini et al [4] present a system that monitors system activity and provides information to an OS module that manages system power. They use this monitoring system in order to demonstrate how to set the threshold idle time used to place a disk in low-power mode. Our work differs in that we attempt to slow down only those parts of the processor that are not on an application’s critical path.
基金
  • This work was supported in part by NSF grants CCR–9701915, CCR–9702466, CCR–9705594, CCR–9811929, EIA–9972881, CCR– 9988361, and EIA–0080124; by DARPA/ITO under AFRL contract F29601-00-K-0182; and by an external research grant from DEC/Compaq
引用论文
  • D. H. Albonesi. Dynamic IPC/Clock Rate Optimization. Proceedings of the 25th International Symposium on Computer Architecture, pages 282–292, June 1998.
    Google ScholarLocate open access versionFindings
  • F. Bellosa. OS-Directed Throttling of Processor Activity for Dynamic Power Management. Technical Report TRI4-3-99, C.S. Dept., University of Erlangen, Germany, June 1999.
    Google ScholarFindings
  • F. Bellosa. The Benefits of Event-Driven Energy Accounting in Power-Sensitive Systems. In Proceedings of the 9th ACM SIGOPS European Workshop, Sept. 2000.
    Google ScholarLocate open access versionFindings
  • L. Benini, A. Bogliolo, S. Cavallucci, and B. Ricco. Monitoring System Activity for OS-directed Dynamic Power Management. In Proceedings of the International Symposium on Low-Power Electronics and Design, Aug. 1998.
    Google ScholarLocate open access versionFindings
  • D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In Proceedings of the 27th International Symposium on Computer Architecture, June 2000.
    Google ScholarLocate open access versionFindings
  • D. Burger and T. Austin. The Simplescalar Tool Set, Version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison, Wisconsin, June 1997.
    Google ScholarFindings
  • J. Casmira and D. Grunwald. Dynamic Instruction Scheduling Slack. In Proceedings of the Kool Chips Workshop, in conjunction with the 33rd International Symposium on Microarchitecture (MICRO-33), Dec. 2000.
    Google ScholarLocate open access versionFindings
  • B. Chappell. The fine art of IC design. IEEE Spectrum, 36(7):30–34, July 1999.
    Google ScholarLocate open access versionFindings
  • B. R. Childers, H. Tang, and R. Melhem. Adapting Processor Supply Voltage to Instruction-Level Parallelism. In Proceedings of the Kool Chips Workshop, in conjunction with the 33rd International Symposium on Microarchitecture (MICRO-33), Dec. 2000.
    Google ScholarLocate open access versionFindings
  • L. T. Clark. Circuit Design of XScale Microprocessors. In 2001 Symposium on VLSI Circuits, Short Course on Physical Design for Low-Power and High-Performance Microprocessor Circuits. IEEE Solid-State Circuits Society, June 2001.
    Google ScholarLocate open access versionFindings
  • J. H. Edmondson et al. Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad–issue CMOS RISC Microprocessor. Digital Technical Journal, 7(1):119–135, 1995. Special Edition.
    Google ScholarLocate open access versionFindings
  • B. Fields, S. Rubin, and R. Bodik. Focusing Processor Policies via Critical-Path Prediction. In Proceedings of the 28th International Symposium on Computer Architecture, July 2001.
    Google ScholarLocate open access versionFindings
  • M. Fleischmann. Longrun power management. Technical report, Transmeta Corporation, Jan. 2001.
    Google ScholarFindings
  • P. N. Glaskowsky. Pentium 4 (Partially) Previewed. Microprocessor Report, 14(8):1,11–13, Aug. 2000.
    Google ScholarLocate open access versionFindings
  • K. Govil, E. Chang, and H. Wasserman. Comparing Algorithms for Dynamic Speed-Setting of a Low-Power CPU. In Proceedings of the 1st ACM/IEEE International Conference on Mobile Computing and Networking, pages 13–25, Nov. 1995.
    Google ScholarLocate open access versionFindings
  • T. R. Halfhill. Transmeta breaks x86 low-power barrier. Microprocessor Report, 14(2), Feb. 2000.
    Google ScholarLocate open access versionFindings
  • T. Horel and G. Lauterbach. UltraSPARC III: Designing Third-Generation 64-Bit Performance. IEEE Micro, 19(3):73–85, May/June 1999.
    Google ScholarLocate open access versionFindings
  • C.-H. Hsu, U. Kremer, and M. Hsiao. Compiler-Directed Dynamic Frequency and Voltage Scaling. In Proceedings of the Workshop on Power-Aware Computer Systems, in conjunction with the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), Nov. 2000.
    Google ScholarLocate open access versionFindings
  • C. J. Hughes, J. Srinivasan, and S. V. Adve. Saving Energy with Architectural and Frequency Adaptations for Multimedia Applications. In Proceedings of the 34th Annual International Symposium on Microarchitecture (MICRO-34), Dec. 2001.
    Google ScholarLocate open access versionFindings
  • R. E. Kessler, E. J. McLellan, and D. A. Webb. The Alpha 21264 Microprocessor Architecture. In Proceedings of the International Conference on Computer Design, pages 90– 95, Austin, Texas, Oct. 1998. IEEE Computer Society.
    Google ScholarLocate open access versionFindings
  • S. Leibson. XScale (StrongArm-2) Muscles In. Microprocessor Report, 14(9):7–12, Sept. 2000.
    Google ScholarLocate open access versionFindings
  • T. Li and C. Ding. Instruction Balance, Energy Consumption and Program Performance. Technical Report UR-CSTR-739, Computer Science Dept., University of Rochester, Dec. 2000. Revised February 2001.
    Google ScholarFindings
  • D. Marculescu. On the Use of Microarchitecture-Driven Dynamic Voltage Scaling. In Proceedings of the Workshop on Complexity-Effective Design, in conjunction with the 27th International Symposium on Computer Architecture, June 2000.
    Google ScholarLocate open access versionFindings
  • D. Matzke. Will Physical Scalability Sabotage Performance Gains? IEEE Computer, 30(9):37–39, Sept. 1997.
    Google ScholarLocate open access versionFindings
  • T. Pering, T. Burd, and R. W. Brodersen. The Simulation and Evaluation of Dynamic Voltage Scaling Algorithms. In Proceedings of the International Symposium on Low-Power Electronics and Design, Aug. 1998.
    Google ScholarLocate open access versionFindings
  • R. Pyreddy and G. Tyson. Evaluating Design Tradeoffs in Dual Speed Pipelines. In Proceedings of the Workshop on Complexity-Effective Design, in conjunction with the 28th International Symposium on Computer Architecture, June 2001.
    Google ScholarLocate open access versionFindings
  • L. F. G. Sarmenta, G. A. Pratt, and S. A. Ward. Rational Clocking. In Proceedings of the International Conference on Computer Design, Austin, Texas, Oct. 1995.
    Google ScholarLocate open access versionFindings
  • A. E. Sjogren and C. J. Myers. Interfacing Synchronous and Asynchronous Modules Within A High-Speed Pipeline. In Proceedings of the 17th Conference on Advanced Research in VLSI, pages 47–61, Ann Arbor, Michigan, Sept. 1997.
    Google ScholarLocate open access versionFindings
  • G. Sohi. Instruction Issue Logic for High-Performance Interruptible, Multiple Functional Unit, Pipelined Computers. ACM Transactions on Computer Systems, 39(3):349–359, Mar. 1990.
    Google ScholarLocate open access versionFindings
  • TSMC Corp. TSMC Technology Roadmap, July 2001.
    Google ScholarFindings
  • M. Weiser, A. Demers, B. Welch, and S. Shenker. Scheduling for Reduced CPU Energy. In Proceedings of the 1st USENIX Symposium on Operating Systems Design and Implementation, Nov. 1994.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn