Since parallel loops account for the greatest percentage of parallelism in numerical programs, the efficient scheduling of such loops is vital to program and system performance
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Trans. Computers, no. 12 (1987): 1425-1439
This paper proposes guided self-scheduling, a new approach for scheduling arbitrarily nested parallel program loops on shared memory multiprocessor systems. Utilizing loop parallelism is clearly most crucial in achieving high system and program performance. Because of its simplicity, guided self-scheduling is particularly suited for imple...更多
下载 PDF 全文
- This paper proposes guided self-scheduling, a new approach for scheduling arbitrarily nested parallel program loops on shared memory multiprocessor systems.
- Because of its simplicity, guided selfscheduling is suited for implementation on real parallel machines
- This method achieves simultaneously the two most important objectives: load balancing and very low synchronization overhead.
- An even greater obstacle is the inability to efficiently utilize such massively parallel systems
- Problems such as specifying parallelism, mapping or scheduling parallel programs on a given architecture, synchronizing the execution of a parallel program, memory management in parallel processing environments, and compiling for parallel machines remain areas for much future work.
- Existing methods are not adequate because they consider an idealized form of the problem where task execution times are fixed and known in advance, and they ignore "side-effects"
- This paper proposes guided self-scheduling, a new approach for scheduling arbitrarily nested parallel program loops on shared memory multiprocessor systems
- This paper proposes guided self-scheduling (GSS), a new method for executing parallel loops on parallel processor systems
- Since parallel loops account for the greatest percentage of parallelism in numerical programs, the efficient scheduling of such loops is vital to program and system performance
- In this paper we presented an efficient dynamic approach to solve the loop scheduling problem
- By guiding the amount of work given to each processor, very good load balancing is achieved
- If the GSS scheme is coupled with loop coalescing, the overhead can be further reduced, and by choosing the minimum unit-of allocation, guided self-scheduling can be tuned to perform optimally for any given loop-system combination, we showed that for certain types of loops the GSS scheme is optimal
- DESIGN RULES FOR RUN
TIME SCHEDULING SCHEMES processors requested by that task. This is true assuming that a
When static scheduling is used, the run-time overhead is task cannot start execution unless all processors allocated to it minimal.
- TIME SCHEDULING SCHEMES processors requested by that task.
- When static scheduling is used, the run-time overhead is task cannot start execution unless all processors allocated to it minimal.
- Allocation is very useful when it is used in a different context.
- This is a logical consequence of dynamic scheduling.
- While at Since the authors want to avoid deliberate idling of processors as compile time the compiler or an intelligent preprocessor is much as possible, any run-time scheduling scheme should responsible for making the scheduling decisions, at run time rather be designed to ask questions of the this decision must be made in a case-by-case fashion, and the following type: "How much work should the authors give to this time spent for this decision-making process is reflected in the processor?" In other words, when a processor becomes idle, program's execution time
- A simulator was implemented to study the performance of self-scheduling (SS) and GSS (GSS(1)).
- In the SS scheme, loop scheduling was done by assigning a single iteration to each idle processor , , .
- Idle processors access each loop index in a loop nest by using appropriate synchronization instructions.
- The simulator was designed to accept program traces generated by Parafrase, and it can be extended to implement other scheduling strategies.
- Since parallel loops account for the greatest percentage of parallelism in numerical programs, the efficient scheduling of such loops is vital to program and system performance.
- In this paper the authors presented an efficient dynamic approach to solve the loop scheduling problem.
- Two important objectives are automatically satisfied: low overhead and load balancing.
- By guiding the amount of work given to each processor, very good load balancing is achieved.
- The assignment of large iteration blocks, on the other hand, reduces the number of accesses to loop indexes and the run-time overhead
- Table1: THE DETAILED SCHEDULING EVENTS OF THE EXAMPLE OF FIG. 2
- This work was supported in part by the National Science Foundation unlder Grants NSF
cover most cases: 5
Serial and parallel loops are specified by the programmer, or are created by a restructuring compiler (e.g., Parafrase). The loops of Fig. 5 cover most cases since they include loops that are 1) all parallel and perfectly nested (LI), 2) hybrid and perfectly nested (L 3), 3) all parallel and nonperfectly nested (L2), 4) hybrid nonperfectly nested (L4), 5) and finally one-way (L2), and multiway nested (L4). The arrows in L4 indicate flow dependences between adjacent loops
- [I] A. V. Aho and I. D. Ullman, Principles of Compiler Design, Reading, MA: Addison-Wesley, 1977.
- [21 Alliant Computer Systems Corp., FX/Series Architecture Manual, Acton. MA, 1985.
- 13] U. Banerjee, "Speedup of ordinary programs," Ph.D. dissertation, Univ. Illinois, Urbana-Champaign, DCS Rep. UIUCDCS-R-79-989, Oct. 1979.
- E. G. Coffman, Jr., Ed., Computer and Job-Shop Scheduling Theory. New York: Wiley, 1976.
- E. G. Coffman and R. L. Graham, "Optimal scheduling on two processor systems," Acta Informatica. vol. 1, no. 3, 1972.
- "Multitasking user guide," Cray Comput. Syst. Tech. Note SN-0222, Jan. 1985. 717 R. G. Cytron, "Doacross: Beyond vectorization for multiprocessors," extended abstract, in Proc. 1986 Int. Conf. Parallel Processing, St Charles. IL, Aug. 1986, pp. 836-844.
- M.S. Thesis, Univ. Illinois, Urbana-Champaign, DCS Rep. UIUCDCS-R-81-1070, May 1981.
- 919 M. R. Garey and D. S. Johnson, Computers and Intractability, A Guide to the Theory of NP-Completeness. San Francisco, CA: Freeman, 1979.
- [101 A. Gottlieb, R. Grishman. C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer-Designing an MIMD shared-memoty parallel machine," IEEE Trans. Comput. vol. C-32, pp. 175-189, Feb. 1983.
- Parallel supercomputing today and the cedar approach," Science, vol. 231, pp. 967-974, Feb. 28, 1986.
- 112] R. L. Graham, "Bounds on multiprocessor scheduling anomalies and related packing algorithms." in Proc. Spring Joint Comput. Conf., 1972.
- 113] D. J. Kuck, R. Kuhn, D. Padua, B. Leasure, and M. Wolfe, "Dependence graphs and compiler optimizations," in Proc. 8th ACM Symp. Principles Programming Languages, Jan. 1981, pp. 207-2 18.
-  H. Kobayashi, Modeling and Analysis. 2nd ed. Reading. MA: Addison-Wesley, 1981. [IS] C. Kruskal and A. Weiss, "Allocating independent subtasks on parallel processors." IEEE Trans. Software Eng., vol. SE-lI, Oct. 1985.
-  D. J. Kuck, The Structure of Computers and Computations, Vol. 1. New York: Wiley. 1978.
-  D. J. Kuck el al., "The effects of program restructuring, algorithm change and architecture choice on program performance," in Proc. Imt. Conf. Parallel Processing, Aug. 1984. [181 D. A. Padua Haiek, "Multiprocessors: Discussions of some theoretical and practical problems," Ph.D. dissertation, Univ. Illinois, UrbanaChampaign, DCS Rep. UIUCDCS-R-79-990, Nov. 1979.
-  C. D. Polychronopoulos and U. Banerjee, "Processor allocation for horizontal and vertical parallelism and related speedup bounds," IEEE Trans. Comput.. vol. C-36, Apr. 1987. 1201 C. D. Polychronopoulos, D. J. Kuck, and D. A. Padua, "Execution of parallel loops on parallel processor systems," in Proc. 1986 Int. Conf. Parallel Processing, St. Charles, IL, Aug. 1986, pp. 519-527.
-  C. D. Polychronopoulos, "On program restructuring scheduling and communication for parallel processor systems," Ph.D. dissertation, Rep. 595, Center Supercomput. Res. Development, Univ. Illinois, Aug. 1986. 122], "Loop coalescing: A compiler transformation for parallel machines," in Proc. 1987 Int. Conf. Parallel Processing, St. Charles, IL, Aug. 1987.
-  S. Reinhardt, "A data-flow approach to multitasking on CRAY X-MP IEEE Trans. Compt., vol. C-33, July 1984.
- Cm* multiprocessor," in Proc. 1985 Int. Conf. Distributed Comput.
-  B. Smith, "Architecture and applications of the HEP multiprocessor computer system," in Real Time Processing IV, Proc. SPIE, 1981, pp. 241-248. H. S. Stone, "Multiprocessor scheduling with the aid of network flow algorithms," IEEE Trans. Software Eng., vol. SE-3, Jan. 1977.
-  P. Tang and P. C. Yew, "Processor self-scheduling for multiple-nested parallel loops," in Proc. 1986 Int. Conf. Parallel Processing, Aug. CDCS-R-82-1 105, 1982.
-  C. Q. Zhu, P. C. Yew, and D. H. Lawrie, "Cedar synchronization primitives," Lab. Advanced Supercomput. Cedar Doc. 18, Sept. 1983. University of Athens, Athens, Greece, in 1980, the
- University, Nashville, TN, in 1982, and the Ph.D.
- David J. Kuck (S'59-M'69-SM'83-F'85), was born in Muskegon, MI, on October 3, 1937. He received the B.S.E.E. degree from the University of Michigan, Ann Arbor, in 1959, and the M.S. and Ph.D. degrees from Northwestern University, Evanston, IL, in 1960 and 1963, respectively.
- Dr. Kuck has served as an Editor for a number of professional journals, including the IEEE TRANSACTIONS ON COMPUTERS and the Journal of the Association for Computing Machinery.