Adaptive key partitioning in distributed stream processing

Gang Liu, Zeting Wang,Amelie Chi Zhou,Rui Mao

CCF Transactions on High Performance Computing（2024）

Cited 0|Views15

No score

Abstract

In stream processing systems, Key Grouping is a commonly employed partitioning scheme for distributing input tuples among parallel instances of stateful operators. With key grouping, tuples shared public keys in the stream are designated to the specific instance responsible for that key. Typically, the implementation of key grouping involves the use of a hash function. While it is convenient and deterministic, it is also known to cause load imbalance between parallel instances, especially in the presence of skewed data streams. Key-Splitting is an effective technique that distributes tasks associated with keys to downstream operators, facilitating load balancing at a relatively low cost. However, overly increasing parallel instances can lead to excessive aggregation costs, becoming a system bottleneck. In this paper, we show the high aggregation cost brought by the Key-Splitting partitioner at different levels of key separation. To address this challenge, we introduce an adaptive Key-Splitting method which controlling the degree of key separation. We propose a partitioner named FlexD, which aims to achieve dynamic adaptation of key separation limits for streaming data. The partitioner employs key grouping to distribute rare keys and dynamic expansion of processing instances to distribute hot keys. We implemented our method on Apache Storm and evaluated it by using real-world and synthetic datasets. Experimental results show that our method achieves a good balance between load balancing and aggregation cost. Moreover, it outperforms existing methods, achieving higher throughput.

Translated text

Key words

Stream processing,Stream partitioning,Load balancing,Key grouping

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined