Twister:Net - Communication Library for Big Data Processing in HPC and Cloud Environments

2018 IEEE 11th International Conference on Cloud Computing (CLOUD)(2018)

引用 18|浏览106
暂无评分
摘要
Streaming processing and batch data processing are the dominant forms of big data analytics today, with numerous systems such as Hadoop, Spark, and Heron designed to process the ever-increasing explosion of data. Generally, these systems are developed as single projects with aspects such as communication, task management, and data management integrated together. By contrast, we take a component-based approach to big data by developing the essential features of a big data system as independent components with polymorphic implementations to support different requirements. Consequently, we recognize the requirements of both dataflow used in popular Apache Systems and the Bulk Synchronous Processing communication style common in High-Performance Computing (HPC) for different applications. Message Passing Interface (MPI) implementations are dominant in HPC but there are no such standard libraries available for big data. Twister:Net is a stand-alone, highly optimized dataflow style parallel communication library which can be used by big data systems or advanced users. Twister:Net can work both in cloud environments using TCP or HPC environments using MPI implementations. This paper introduces Twister:Net and compares it with existing systems to highlight its design and performance.
更多
查看译文
关键词
Big-data, Collectives, Streaming, MPI, HPC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要