Adaptive watermark generation mechanism based on time series prediction for stream processing

FRONTIERS OF COMPUTER SCIENCE(2021)

引用 3|浏览6
暂无评分
摘要
The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time. In reality, streaming data usually arrives out-of-order due to factors such as network delay. The data stream processing framework commonly adopts the watermark mechanism to address the data disorderedness. Watermark is a special kind of data inserted into the data stream with a timestamp, which helps the framework to decide whether the data received is late and thus be discarded. Traditional watermark generation strategies are periodic; they cannot dynamically adjust the watermark distribution to balance the responsiveness and accuracy. This paper proposes an adaptive watermark generation mechanism based on the time series prediction model to address the above limitation. This mechanism dynamically adjusts the frequency and timing of watermark distribution using the disordered data ratio and other lateness properties of the data stream to improve the system responsiveness while ensuring acceptable result accuracy. We implement the proposed mechanism on top of Flink and evaluate it with real-world datasets. The experiment results show that our mechanism is superior to the existing watermark distribution strategies in terms of both system responsiveness and result accuracy.
更多
查看译文
关键词
data stream processing, watermark, time series based prediction, dynamic adjustment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络