Scaling Event Aggregation at Twitter to Handle Billions of Events per minute

2020 IEEE Infrastructure Conference(2020)

引用 4|浏览2
暂无评分
摘要
Log files consisting of events from different services are a rich source of information for large scale analytics. Events can be as simple as log line or as complex as nested structured objects like thrift or protobuffers. At Twitter every service logs events for a particular category and publishes them to the Event Log Aggregation framework. This framework aggregates events of the same category into log files, usually stored on a distributed file system like the Hadoop Distributed File System (HDFS). Large Scale multi-petabyte analytics use these files across hundreds of projects. In this paper we provide an overview of the Event Aggregation framework used at Twitter, highlight its advantages, and compare it with similar frameworks. We also introduce the concept of category group and aggregator group in our architecture. Services at Twitter generate trillions of events with aggregate size exceeding multiple petabytes of data every day. At present this framework handles over three billion events per minute. The main focus of our efforts has been efficient use of hardware resources, scalability and reliability of the framework.
更多
查看译文
关键词
Event Log collection,Event Log aggregation,Log Analytics,Streaming Event Logs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要