Predictive Performance Modeling for Distributed Computing using Black-Box Monitoring and Machine Learning.

arXiv: Distributed, Parallel, and Cluster Computing(2018)

引用 35|浏览30
暂无评分
摘要
Abstract In many domains, the previous decade was characterized by increasing data volumes and growing complexity of data analyses, creating new demands for batch processing on distributed systems. Effective operation of these systems is challenging when facing uncertainties about the performance of jobs and tasks under varying resource configurations, e. g., for scheduling and resource allocation. We survey predictive performance modeling (PPM) approaches to estimate performance metrics such as execution duration, required memory or wait times of future jobs and tasks based on past performance observations. We focus on non-intrusive methods, i. e., methods that can be applied to any workload without modification, since the workload is usually a black box from the perspective of the systems managing the computational infrastructure. We classify and compare sources of performance variation, predicted performance metrics, limitations and challenges, required training data, use cases, and the underlying prediction techniques. We conclude by identifying several open problems and pressing research needs in the field.
更多
查看译文
关键词
Distributed computing,Resource management,Machine learning,Black box monitoring
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要