Resource-Aware Cache Management for In-Memory Data Analytics Frameworks

2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)(2019)

引用 6|浏览35
暂无评分
摘要
Continuously increasing amounts of data have led to the emergence of distributed in-memory computing systems for higher data processing speeds. Finding an efficient memory management method has become key to improving the performance of these systems because of the confined memory resource. Recent research in dependency-aware caching, e.g., Least Reference Count (LRC) and Least Composition Reference Count (LCRC), has enabled significant improvements in efficiency by profiling the application's Directed Acyclic Graph (DAG). However, these methods do not consider the dynamic occupancy mechanism of the Unified Memory Manager (UMM) in Spark, which could cause heavily referenced and costly data blocks to be evicted and produce high recomputing overhead. To alleviate this defect, we propose a resource-aware cache management approach that uses both runtime resource metrics and dependency information. By applying an adaptive approach, we can retain the data blocks that have greater contributions to obtain the final results. We demonstrate the effectiveness of our cache management approach through a series of widely used benchmarks. The experimental results show that compared with current DAG-aware implementations, our approach improves performance by an average of 15% and up to 24%. Compared with the default Least Recently Used (LRU) strategy in Spark, our implementation reduces job runtime by 28% on average and by up to 61%.
更多
查看译文
关键词
Cache replacement, cache management, inmemory computing, data analytics frameworks, submodular optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要