Blind men and an elephant coalescing open-source, academic, and industrial perspectives on BigData

Data Engineering(2015)

引用 5|浏览48
暂无评分
摘要
This tutorial is organized in two parts. In the first half, we will present an overview of applications and services in the BigData ecosystem. We will use known distributed database and systems literature as landmarks to orient the attendees in this fast-evolving space. Throughout, we will contrast models of resource management, performance, and the constraints that shape the architectures of prominent systems. We will also discuss the role of academia and industry in the development of open-source infrastructure, with an emphasis on open problems and strategies for collaboration. We assume only basic familiarity with distributed systems. In the second half, we will delve into Apache Hadoop YARN. YARN (Yet Another Resource Negotiator) transformed Hadoop from a MapReduce engine to a general-purpose cluster scheduler. Since its introduction, it has been deployed in production and extended to support use cases beyond large-scale batch processing. The tutorial will present the active research and development supporting such heterogeneous workloads, with particular attention to multi-tenant scheduling. Topics include security, resource isolation, protocols, and preemption. This portion will be detailed, but accessible to anyone with a background in distributed systems and all attendees of the first half of the tutorial.
更多
查看译文
关键词
big data,batch processing (computers),data handling,distributed databases,parallel processing,public domain software,apache hadoop yarn,bigdata ecosystem,mapreduce engine,distributed database,general-purpose cluster scheduler,large-scale batch processing,multitenant scheduling,open-source,resource management,yet another resource negotiator,databases,engines,ecosystems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要