Adding Velocity to BigBench

SIGMOD/PODS '18: International Conference on Management of Data Houston TX USA June, 2018(2018)

引用 5|浏览17
暂无评分
摘要
BigBench standardized as TPCx-BB is a popular application benchmark that targets Big Data storage and processing systems. BigBench V2 addresses some of the BigBench limitations by introducing a new simplified data model, semi-structured web logs in JSON file format and new queries mandating late binding. However, it still covers only batch processing workloads and the Big Data velocity characteristic is not addressed. This work extends the BigBench V2 benchmark with a data streaming component that simulates typical statistical and predictive analytics queries in a retail business scenario. Our approach is to preserve the existing BigBench design and introduce a new streaming component that supports two data streaming modes: active and passive. In active mode, the data stream generation and processing happen in parallel, whereas in passive mode, the data stream is pre-generated in advance before the actual stream processing. The stream workload consists of five queries inspired by the existing 30 BigBench queries. To validate the proposed streaming extension, the two streaming modes were implemented and tested using Kafka and Spark Streaming. The experimental results prove the feasibility of our benchmark design. Finally, we outline design challenges and future plans for improving the proposed BigBench extension.
更多
查看译文
关键词
BigBench, Streaming Benchmark, Big Data Benchmarking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要