AI helps you reading Science
AI Insight
AI extracts a summary of this paper
Weibo:
Dynamic data prefetching in home-based software DSMs
J. Comput. Sci. Technol., no. 3 (2001): 231-241
EI WOS SCOPUS
Full Text
Weibo
Abstract
A major overhead in software DSM (Distributed Shared Memory) is the cost of remote memory accesses necessitated by the protocol as well as induced by false sharing. This paper introduces a dynamic prefetching method implemented in the JIAJIA software DSM to reduce system overhead caused by remote accesses. The prefetching method records t...More
Code:
Data:
Introduction
- Software Distributed Shared Memory (DSM) provides the illusion of shared memory on the top of distributed memory hardware.
- Most software DSM systems are page-based, using virtual memory protection to trap accesses to shared memory
- These systems suffer from the high communication and coherence-induced overheads caused by the high level of implementation and large granularity of coherence.
- Many techniques, such as multiplewriter protocol[l], lazy release consistency[=], and d a t a migration[ a], have been proposed to reduce false sharing and remote communication.
- The page faulting processor continues on receiving the page acknowledgement message
Highlights
- Software Distributed Shared Memory (DSM) provides the illusion of shared memory on the top of distributed memory hardware
- The number of remote access messages is reduced because multiple pages may be prefetched in the same message
- The average extra traffic caused by useless prefetch is 7%-13% in the evaluation
- The prefetching scheme proposed in this paper predicts prefetehes by analyzing the periodicity of the access history string about remote writes and local accesses
- The average extra traffic caused by useless prefetch is only 7% in the evaluation when the periodicity threshold is 3, which is much less than that of other prefetching methods such as that introduced in [11]
- In the proposed prefetching scheme, the remote access latency can be overlapped with other operations and multiple pages may be prefetched in the same message
Results
- In "Water, JIAp2 issues more than 30% useless remote accesses and traffic, while JIAB3.
Conclusion
- The prefetching scheme proposed in this paper predicts prefetehes by analyzing the periodicity of the access history string about remote writes and local accesses.
- The periodicity analysis method can predict prefetches rather precisely.
- The average extra traffic caused by useless prefetch is only 7% in the evaluation when the periodicity threshold is 3, which is much less than that of other prefetching methods such as that introduced in [11].
- In the proposed prefetching scheme, the remote access latency can be overlapped with other operations and multiple pages may be prefetched in the same message.
- Among eight benchmarks, the prefetching scheme achieves a performance increment of 15%20% in three benchmarks and around 870-10% in another three
Tables
- Table1: Table 1
- Table2: Run Time Statistics of Parallel Execution
- Table3: Relative Runtime Statistics
- Table4: FFT Results with Different Plimit
Related work
- There is some previous work regarding data prefetching in software DSMs. A similar work to ours was proposed in [il i by Karlsson et aI. Their approach is also based on the previous access history in software DSMs and also issues prefetching messages after
Vol.16 synchronization. However, their approach is based on homeless software DSM (TreadMarks) while ours is on home-based software DSM. Our prefetching algorithm is also different from theirs. Their algorithm decides prefetching according to remote and local accesses during last two intervals, while ours analyzes the periodicity from previous INV (invalidation) and GETP (fetching a remote page) interleaving string.
Funding
- This work is supported by the National Natural Science Foundation of China (No.60073018)
Reference
- Carter J, Bennet J, Zwaenepoel W. Implementation and performance of Munin. In Proc. the 13th Syrup. Operating Systems Principles, Oct., 1991, pp.152-164.
- Keleher P, Dwarkadas S, Cox A, Zwaenepoel W. TreadMarks distributed shared memory on standard workstations and operatiag systems. In Proc. the 1994 Winter Usenix Conf., Jan., 1994, pp.115-131.
- Hu Weiwu, Shi Weisong, Tang Zhimin. Optimizing home-based software DSM protocols. Cluster Computing, to appear in 2001.
- Hu Weiwu, Shi Weisong, Tang Zhimin, Li Ming. A lock-based cache coherence protocol for scope consistency. Journal of Computer Science and Technology, Mar., 1998, 13(2): 97-109.
- Woo S, Ohara M, Torrie E et al. The SPLASH-2 programs: Characterization and methodological considerations. In Prac. ISCA'95, 1995, pp.24-36.
- Bailey D, Barton J, Lasinski T, Simon H. The NAS parallel benchmarks. Technical Report No.103863, NASA, Jul., 1993.
- Lu H, Dwarkadas S, Cox A, Zwaenepoel W. Quantifying the performance differences between PVM and TreadMarks. Journal of Parallel and Distributed Computing, Jun., 1997, 43(2): 65-78.
- Iftode L. Home-based shared virtual memory [dissertation]. Princeton University, Aug., 1998.
- Hu Weiwu, Shi Weisong, Tang Zhimin. Reducing system overhead in home-based software DSMs. In Proc. the 13th Int. Parallel Processing Syrup., Apr., 1999, pp.167-173.
- Hu Weiwu, Zhang Fuxin, Liu Haiming. A new home-based software DSM protocol for SMP clusters. In Proc. the 6th Euro-Par Conference, Aug., 2000, pp.1132-1142.
- Karlsson M, Stenstrom P. Effectiveness of dynamic prefetching in multiple-writer distributed virtual shared memory system. Journal of Parallel and Distributed Computing, Jun., 1997, 43(2): 79-93.
- Bianchini R, Kontothanasis L, Pinto R et al. Hiding communication latency and coherence overhead in software DSMs. In Proc. 7th Int. Conf. Architectural Support for Programming Languages and Operating Systems, 1996, pp.198-209.
- Mowry T, Gupta A. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. Journal of Parallel and Distributed Computing, Jun., 1991, 12(2): 87-106.
- Dwarkadas S, Lu H, Cox A et al. Combining compile-time and runtime support for efficient software distributed shared memory. In Proc. IEEE, Special Issue on Distributed Shared Memory, Mar., 1999, pp.476-486.
- Keleher P, Tseng C. Enhancing software DSM for compiler-parMlelized applications. In Proc. the 11th Int. Parallel Processing Symposium, Apr., 1997.
- Chandra S, Larus J. Optimizimg communication in HPF programs for fine-grained distributed shared memory. In Proc. the 6th Syrup. Principles and Practice of Parallel Programming, Jun., 1997.
- Amza C, Cox A, Dwarkadas S et al. Adaptive protocols for software distributed shared memory. In Proc. [EEE, Special Issue on Distributed Shared Memory, Mar., 1999, pp.467-475.
- Bershad B, Zekauskas M, Sawdon W. The Midway Distributed Shared Memory System. In Proc. the 38th I E E E Int. CompCon Conf., Feb., 1993, pp.528-537.
- Dwark~das S, Schaffer A, Cottingham R et al. Parallelization of general linkage analysis problems. Human Heredity, 1994, 44: 127-141.
- Lathtop G, Lalouel J, Jurier C, Ott J. Strategies for multilocus analysis in humans. P N A S, 1994, 81: 3443-3446.
- Li K. IVY: A shared virtual memory system for parallel computing. In Proc. the 1988 Int. Conf. Parallel Processing, Aug., 1988, 2: 94-101.
- Schaffer A, Gupta S, Shriram K, Cottingham R. Avoiding recomputation in genetic linkage analysis. Human Heredity, 1994, 44: 225-237. HUWeiwu received his B.S. degree from the University of Science and Technology of China in 1991 and his Ph.D. degree from the Institute of Computing Technology, The Chinese Academy of Sciences in 1996, both in computer science. He is currently a professor in the Institute of Computing Technology. His research interests include high performance computer architecture, parallel processing, and SOC design.
- ZHANGFuxin received his B.S. degree in computing technology from the University of Science and Technology of China in 1999. He is currently an M.S. candidate in the Institute of Computing Technology, The Chinese Academy of Sciences. His research interests include high performance computer architecture, cluster computing, and LINUX.
- LIU Haiming received his B.S. degree in computing technology from the University of Science and Technology of China in 1999. He is currently an M.S. candidate in the Institute of Computing Technology, The Chinese Academy of Sciences. His research interests include high performance computer architecture and cluster computing.
Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn