Optimization Techniques For A Distributed In-Memory Computing Platform By Leveraging Ssd

APPLIED SCIENCES-BASEL(2021)

引用 2|浏览5
暂无评分
摘要
In this paper, we present several optimization strategies that can improve the overall performance of the distributed in-memory computing system, "Apache Spark". Despite its distributed memory management capability for iterative jobs and intermediate data, Spark has a significant performance degradation problem when the available amount of main memory (DRAM, typically used for data caching) is limited. To address this problem, we leverage an SSD (solid-state drive) to supplement the lack of main memory bandwidth. Specifically, we present an effective optimization methodology for Apache Spark by collectively investigating the effects of changing the capacity fraction ratios of the shuffle and storage spaces in the "Spark JVM Heap Configuration" and applying different "RDD Caching Policies" (e.g., SSD-backed memory caching). Our extensive experimental results show that by utilizing the proposed optimization techniques, we can improve the overall performance by up to 42%.
更多
查看译文
关键词
Apache Spark, memory management, solid-state drive, in-memory processing framework, performance, PageRank, transitive closure, TeraSort, k-means clustering, Java Virtual Machine heap configuration, resilient distributed dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要