A predictive scheduling framework for fast and distributed stream data processing

Big Data（2015）

引用 42|浏览43

暂无评分

摘要

In a distributed stream data processing system, an application is usually modeled using a directed graph, in which each vertex corresponds to a data source or a processing unit, and edges indicate data flow. In this paper, we propose a novel predictive scheduling framework to enable fast and distributed stream data processing, which features topology-aware performance prediction and predictive scheduling. For prediction, we present a topology-aware method to accurately predict the average tuple processing time of an application for a given scheduling solution, according to the topology of the application graph and runtime statistics. For scheduling, we present an effective algorithm to assign threads to machines under the guidance of prediction results. To validate and evaluate the proposed framework, we implemented it based on a highly-regarded distributed stream data processing platform, Storm, and tested it with two representative applications: word count (stream version) and log stream processing. Extensive experimental results show 1) The topology-aware prediction method offers an average accuracy of 83.7%. 2) The predictive scheduling framework reduces the average tuple processing time by 25.9% on average, compared to Storm's default scheduler.

查看译文

关键词

log stream processing,word count,Storm,runtime statistics,average tuple processing time prediction,topology-aware performance prediction,data flow,graph edges,processing unit,data source,graph vertex,directed graph,fast-distributed stream data processing,predictive scheduling framework

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要