Adaptive Stream Processing using Dynamic Batch Sizing.

MOD(2014)

引用 151|浏览178
暂无评分
摘要
ABSTRACTThe need for real-time processing of "big data" has led to the development of frameworks for distributed stream processing in clusters. It is important for such frameworks to be robust against variable operating conditions such as server failures, changes in data ingestion rates, and workload characteristics. To provide fault tolerance and efficient stream processing at scale, recent stream processing frameworks have proposed to treat streaming workloads as a series of batch jobs on small batches of streaming data. However, the robustness of such frameworks against variable operating conditions has not been explored. In this paper, we explore the effects of the batch size on the performance of streaming workloads. The throughput and end-to-end latency of the system can have complicated relationships with batch sizes, data ingestion rates, variations in available resources, workload characteristics, etc. We propose a simple yet robust control algorithm that automatically adapts the batch size as the situation necessitates. We show through extensive experiments that it can ensure system stability and low latency for a wide range of workloads, despite large variations in data rates and operating conditions.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要