AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
Since parallel loops account for the greatest percentage of parallelism in numerical programs, the efficient scheduling of such loops is vital to program and system performance

Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Trans. Computers, no. 12 (1987): 1425-1439

被引用649|浏览18
EI WOS
下载 PDF 全文
引用
微博一下

摘要

This paper proposes guided self-scheduling, a new approach for scheduling arbitrarily nested parallel program loops on shared memory multiprocessor systems. Utilizing loop parallelism is clearly most crucial in achieving high system and program performance. Because of its simplicity, guided self-scheduling is particularly suited for imple...更多

代码

数据

0
简介
  • This paper proposes guided self-scheduling, a new approach for scheduling arbitrarily nested parallel program loops on shared memory multiprocessor systems.
  • Because of its simplicity, guided selfscheduling is suited for implementation on real parallel machines
  • This method achieves simultaneously the two most important objectives: load balancing and very low synchronization overhead.
  • An even greater obstacle is the inability to efficiently utilize such massively parallel systems
  • Problems such as specifying parallelism, mapping or scheduling parallel programs on a given architecture, synchronizing the execution of a parallel program, memory management in parallel processing environments, and compiling for parallel machines remain areas for much future work.
  • Existing methods are not adequate because they consider an idealized form of the problem where task execution times are fixed and known in advance, and they ignore "side-effects"
重点内容
  • This paper proposes guided self-scheduling, a new approach for scheduling arbitrarily nested parallel program loops on shared memory multiprocessor systems
  • This paper proposes guided self-scheduling (GSS), a new method for executing parallel loops on parallel processor systems
  • Since parallel loops account for the greatest percentage of parallelism in numerical programs, the efficient scheduling of such loops is vital to program and system performance
  • In this paper we presented an efficient dynamic approach to solve the loop scheduling problem
  • By guiding the amount of work given to each processor, very good load balancing is achieved
  • If the GSS scheme is coupled with loop coalescing, the overhead can be further reduced, and by choosing the minimum unit-of allocation, guided self-scheduling can be tuned to perform optimally for any given loop-system combination, we showed that for certain types of loops the GSS scheme is optimal
方法
  • DESIGN RULES FOR RUN

    TIME SCHEDULING SCHEMES processors requested by that task. This is true assuming that a

    When static scheduling is used, the run-time overhead is task cannot start execution unless all processors allocated to it minimal.
  • TIME SCHEDULING SCHEMES processors requested by that task.
  • When static scheduling is used, the run-time overhead is task cannot start execution unless all processors allocated to it minimal.
  • Allocation is very useful when it is used in a different context.
  • This is a logical consequence of dynamic scheduling.
  • While at Since the authors want to avoid deliberate idling of processors as compile time the compiler or an intelligent preprocessor is much as possible, any run-time scheduling scheme should responsible for making the scheduling decisions, at run time rather be designed to ask questions of the this decision must be made in a case-by-case fashion, and the following type: "How much work should the authors give to this time spent for this decision-making process is reflected in the processor?" In other words, when a processor becomes idle, program's execution time
结果
  • A simulator was implemented to study the performance of self-scheduling (SS) and GSS (GSS(1)).
  • In the SS scheme, loop scheduling was done by assigning a single iteration to each idle processor [2], [6], [28].
  • Idle processors access each loop index in a loop nest by using appropriate synchronization instructions.
  • The simulator was designed to accept program traces generated by Parafrase, and it can be extended to implement other scheduling strategies.
结论
  • Since parallel loops account for the greatest percentage of parallelism in numerical programs, the efficient scheduling of such loops is vital to program and system performance.
  • In this paper the authors presented an efficient dynamic approach to solve the loop scheduling problem.
  • Two important objectives are automatically satisfied: low overhead and load balancing.
  • By guiding the amount of work given to each processor, very good load balancing is achieved.
  • The assignment of large iteration blocks, on the other hand, reduces the number of accesses to loop indexes and the run-time overhead
表格
  • Table1: THE DETAILED SCHEDULING EVENTS OF THE EXAMPLE OF FIG. 2
Download tables as Excel
基金
  • This work was supported in part by the National Science Foundation unlder Grants NSF
研究对象与分析
cover most cases: 5
Serial and parallel loops are specified by the programmer, or are created by a restructuring compiler (e.g., Parafrase). The loops of Fig. 5 cover most cases since they include loops that are 1) all parallel and perfectly nested (LI), 2) hybrid and perfectly nested (L 3), 3) all parallel and nonperfectly nested (L2), 4) hybrid nonperfectly nested (L4), 5) and finally one-way (L2), and multiway nested (L4). The arrows in L4 indicate flow dependences between adjacent loops

引用论文
  • [I] A. V. Aho and I. D. Ullman, Principles of Compiler Design, Reading, MA: Addison-Wesley, 1977.
    Google ScholarFindings
  • [21 Alliant Computer Systems Corp., FX/Series Architecture Manual, Acton. MA, 1985.
    Google ScholarFindings
  • 13] U. Banerjee, "Speedup of ordinary programs," Ph.D. dissertation, Univ. Illinois, Urbana-Champaign, DCS Rep. UIUCDCS-R-79-989, Oct. 1979.
    Google ScholarFindings
  • E. G. Coffman, Jr., Ed., Computer and Job-Shop Scheduling Theory. New York: Wiley, 1976.
    Google ScholarFindings
  • E. G. Coffman and R. L. Graham, "Optimal scheduling on two processor systems," Acta Informatica. vol. 1, no. 3, 1972.
    Google ScholarLocate open access versionFindings
  • "Multitasking user guide," Cray Comput. Syst. Tech. Note SN-0222, Jan. 1985. 717 R. G. Cytron, "Doacross: Beyond vectorization for multiprocessors," extended abstract, in Proc. 1986 Int. Conf. Parallel Processing, St Charles. IL, Aug. 1986, pp. 836-844.
    Google ScholarLocate open access versionFindings
  • M.S. Thesis, Univ. Illinois, Urbana-Champaign, DCS Rep. UIUCDCS-R-81-1070, May 1981.
    Google ScholarFindings
  • 919 M. R. Garey and D. S. Johnson, Computers and Intractability, A Guide to the Theory of NP-Completeness. San Francisco, CA: Freeman, 1979.
    Google ScholarFindings
  • [101 A. Gottlieb, R. Grishman. C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer-Designing an MIMD shared-memoty parallel machine," IEEE Trans. Comput. vol. C-32, pp. 175-189, Feb. 1983.
    Google ScholarLocate open access versionFindings
  • Parallel supercomputing today and the cedar approach," Science, vol. 231, pp. 967-974, Feb. 28, 1986.
    Google ScholarLocate open access versionFindings
  • 112] R. L. Graham, "Bounds on multiprocessor scheduling anomalies and related packing algorithms." in Proc. Spring Joint Comput. Conf., 1972.
    Google ScholarLocate open access versionFindings
  • 113] D. J. Kuck, R. Kuhn, D. Padua, B. Leasure, and M. Wolfe, "Dependence graphs and compiler optimizations," in Proc. 8th ACM Symp. Principles Programming Languages, Jan. 1981, pp. 207-2 18.
    Google ScholarLocate open access versionFindings
  • [14] H. Kobayashi, Modeling and Analysis. 2nd ed. Reading. MA: Addison-Wesley, 1981. [IS] C. Kruskal and A. Weiss, "Allocating independent subtasks on parallel processors." IEEE Trans. Software Eng., vol. SE-lI, Oct. 1985.
    Google ScholarLocate open access versionFindings
  • [16] D. J. Kuck, The Structure of Computers and Computations, Vol. 1. New York: Wiley. 1978.
    Google ScholarFindings
  • [17] D. J. Kuck el al., "The effects of program restructuring, algorithm change and architecture choice on program performance," in Proc. Imt. Conf. Parallel Processing, Aug. 1984. [181 D. A. Padua Haiek, "Multiprocessors: Discussions of some theoretical and practical problems," Ph.D. dissertation, Univ. Illinois, UrbanaChampaign, DCS Rep. UIUCDCS-R-79-990, Nov. 1979.
    Google ScholarFindings
  • [19] C. D. Polychronopoulos and U. Banerjee, "Processor allocation for horizontal and vertical parallelism and related speedup bounds," IEEE Trans. Comput.. vol. C-36, Apr. 1987. 1201 C. D. Polychronopoulos, D. J. Kuck, and D. A. Padua, "Execution of parallel loops on parallel processor systems," in Proc. 1986 Int. Conf. Parallel Processing, St. Charles, IL, Aug. 1986, pp. 519-527.
    Google ScholarLocate open access versionFindings
  • [21] C. D. Polychronopoulos, "On program restructuring scheduling and communication for parallel processor systems," Ph.D. dissertation, Rep. 595, Center Supercomput. Res. Development, Univ. Illinois, Aug. 1986. 122], "Loop coalescing: A compiler transformation for parallel machines," in Proc. 1987 Int. Conf. Parallel Processing, St. Charles, IL, Aug. 1987.
    Google ScholarFindings
  • [23] S. Reinhardt, "A data-flow approach to multitasking on CRAY X-MP IEEE Trans. Compt., vol. C-33, July 1984.
    Google ScholarLocate open access versionFindings
  • Cm* multiprocessor," in Proc. 1985 Int. Conf. Distributed Comput.
    Google ScholarFindings
  • [26] B. Smith, "Architecture and applications of the HEP multiprocessor computer system," in Real Time Processing IV, Proc. SPIE, 1981, pp. 241-248. H. S. Stone, "Multiprocessor scheduling with the aid of network flow algorithms," IEEE Trans. Software Eng., vol. SE-3, Jan. 1977.
    Google ScholarLocate open access versionFindings
  • [28] P. Tang and P. C. Yew, "Processor self-scheduling for multiple-nested parallel loops," in Proc. 1986 Int. Conf. Parallel Processing, Aug. CDCS-R-82-1 105, 1982.
    Google ScholarLocate open access versionFindings
  • [30] C. Q. Zhu, P. C. Yew, and D. H. Lawrie, "Cedar synchronization primitives," Lab. Advanced Supercomput. Cedar Doc. 18, Sept. 1983. University of Athens, Athens, Greece, in 1980, the
    Google ScholarLocate open access versionFindings
  • University, Nashville, TN, in 1982, and the Ph.D.
    Google ScholarFindings
  • David J. Kuck (S'59-M'69-SM'83-F'85), was born in Muskegon, MI, on October 3, 1937. He received the B.S.E.E. degree from the University of Michigan, Ann Arbor, in 1959, and the M.S. and Ph.D. degrees from Northwestern University, Evanston, IL, in 1960 and 1963, respectively.
    Google ScholarLocate open access versionFindings
  • Dr. Kuck has served as an Editor for a number of professional journals, including the IEEE TRANSACTIONS ON COMPUTERS and the Journal of the Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科