Naiad: a timely dataflow system

SOSP, 2013.

被引用626|引用|浏览216|DOI:https://doi.org/10.1145/2517349.2522738
EI
其它链接dl.acm.org|dblp.uni-trier.de|academic.microsoft.com
微博一下
Naiad’s performance and expressiveness demonstrate that timely dataflow is a powerful general-purpose lowlevel programming abstraction for iterative and streaming computation

摘要

Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers the high throughput of batch processors, the low latency of stream processors, and the ability to perform iterative and incremental computations. Although existing systems offer some of these features, applications that require all three have re...更多

代码

数据

简介
  • Many data processing tasks require low-latency interactive access to results, iterative sub-computations, and consistent intermediate outputs so that sub-computations can be nested and composed.
  • Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
  • Copyrights for third-party components of this work must be honored.
  • Contact the Owner/Author.
  • Copyright is held by the Owner/Author(s).
重点内容
  • Many data processing tasks require low-latency interactive access to results, iterative sub-computations, and consistent intermediate outputs so that sub-computations can be nested and composed
  • We have developed a new computational model, timely dataflow, that supports the following features: 1. structured loops allowing feedback in the dataflow, 2. stateful dataflow vertices capable of consuming and producing records without global coordination, and
  • The timestamps in timely dataflow are more complicated than traditional integer-valued timestamps [22, 38], the vertex programming model supports many advanced use cases that motivate other systems
  • Naiad’s performance and expressiveness demonstrate that timely dataflow is a powerful general-purpose lowlevel programming abstraction for iterative and streaming computation
  • Researchers benefit from the ability to differentiate advances in high-level abstractions from advances in the design and implementation of low-level systems, while users benefit from a wider variety of composable programming patterns and fewer, more fully realized systems
结论
  • The timestamps in timely dataflow are more complicated than traditional integer-valued timestamps [22, 38], the vertex programming model supports many advanced use cases that motivate other systems.

    The requirement that a vertex explicitly request notifications allows a programmer to make performance tradeoffs by choosing when to use coordination.
  • The monotonic aggregation operators in BloomL [13] may continually revise their output without coordination; in Naiad a vertex can achieve this by sending outputs from ONRECV.
  • Such an implementation can improve performance inside a loop by allowing fast uncoordinated iteration, at the possible expense of sending multiple messages before the output reaches its final value.
  • Researchers benefit from the ability to differentiate advances in high-level abstractions from advances in the design and implementation of low-level systems, while users benefit from a wider variety of composable programming patterns and fewer, more fully realized systems
总结
  • Introduction:

    Many data processing tasks require low-latency interactive access to results, iterative sub-computations, and consistent intermediate outputs so that sub-computations can be nested and composed.
  • Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
  • Copyrights for third-party components of this work must be honored.
  • Contact the Owner/Author.
  • Copyright is held by the Owner/Author(s).
  • Conclusion:

    The timestamps in timely dataflow are more complicated than traditional integer-valued timestamps [22, 38], the vertex programming model supports many advanced use cases that motivate other systems.

    The requirement that a vertex explicitly request notifications allows a programmer to make performance tradeoffs by choosing when to use coordination.
  • The monotonic aggregation operators in BloomL [13] may continually revise their output without coordination; in Naiad a vertex can achieve this by sending outputs from ONRECV.
  • Such an implementation can improve performance inside a loop by allowing fast uncoordinated iteration, at the possible expense of sending multiple messages before the output reaches its final value.
  • Researchers benefit from the ability to differentiate advances in high-level abstractions from advances in the design and implementation of low-level systems, while users benefit from a wider variety of composable programming patterns and fewer, more fully realized systems
表格
  • Table1: Running times in seconds of several graph algorithms on the Category A web graph. Non-Naiad measurements are due to Najork et al [<a class="ref-link" id="c34" href="#r34">34</a>]
Download tables as Excel
相关工作
  • Dataflow Recent systems such as CIEL [30], Spark [42], Spark Streaming [43], and Optimus [19] extend acyclic batch dataflow [15, 18] to allow dynamic modification of the dataflow graph, and thus support iteration and incremental computation without adding cycles to the dataflow. By adopting a batch-computation model, these systems inherit powerful existing techniques including fault tolerance with parallel recovery; in exchange each requires centralized modifications to the dataflow graph, which introduce substantial overhead that Naiad avoids. For example, Spark Streaming can process incremental updates in around one second, while in Section 6 we show that Naiad can iterate and perform incremental updates in tens of milliseconds.

    Stream processing systems support low-latency dataflow computations over a static dataflow graph, using punctuations in the stream of records [38] to signal completeness. Punctuations can implement blocking operators such as GROUP BY [38], but do not support general iteration. MillWheel [5] is a recent example of a streaming system with punctuations (and sophisticated fault-tolerance) that adopts a vertex API very similar to Naiad’s, but does not support loops. Chandramouli et al propose the flying fixed-point operator [9] to handle cyclic streams when dataflows do not allow record retraction. In contrast, Naiad can execute algorithms that use retractions, such as sliding-window connected components and strongly connected components.
基金
  • Shows that many powerful high-level programming models can be built on Naiad’s low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining
  • Our goal is to develop a general-purpose system that fulfills all of these requirements and supports a wide variety of high-level programming models, while achieving the same performance as a specialized system
  • Shows that these primitives are sufficient to express existing frameworks as composable and efficient libraries
  • Evaluates Naiad against several batch and incremental workloads, and use microbenchmarks to investigate the performance of its underlying mechanisms
引用论文
  • The ClueWeb09 Dataset. http://lemurproject.org/clueweb09.
    Findings
  • Parallel Data Warehouse. http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/pdw.aspx.
    Findings
  • Storm: Distributed and fault-tolerant realtime computation. http://storm-project.net/.
    Findings
  • M. Abadi, F. McSherry, D. G. Murray, and T. L. Rodeheffer. Formal analysis of a distributed algorithm for tracking progress. In Proceedings of the IFIP Joint International Conference on Formal Techniques for Distributed Systems, June 2013.
    Google ScholarLocate open access versionFindings
  • T. Akidau, A. Balikov, K. Bekiroglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. MillWheel: faulttolerant stream processing at Internet scale. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB), Aug. 2013.
    Google ScholarLocate open access versionFindings
  • M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhyey, P. Pately, B. Prabhakarz, S. Senguptay, and M. Sridharany. Data Center TCP (DCTCP). In Proceedings of the ACM International Conference on Applications, Technologies, Architectures and Protocols for Computer Communications (SIGCOMM), Aug. 2010.
    Google ScholarLocate open access versionFindings
  • P. Alvaro, N. Conway, J. M. Hellerstein, and W. R. Marczak. Consistency analysis in Bloom: a CALM and collected approach. In Proceedings of the 5th Conference on Innovative Data Systems Research (CIDR), Jan. 2011.
    Google ScholarLocate open access versionFindings
  • B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas, C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali, R. Abbasi, A. Agarwal, M. F. ul Haq, M. I. ul Haq, D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, K. Manivannan, and L. Rigas. Windows Azure Storage: a highly available cloud storage service with strong consistency. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP), Oct. 2011.
    Google ScholarLocate open access versionFindings
  • B. Chandramouli, J. Goldstein, and D. Maier. On-the-fly progress detection in iterative stream queries. Proceedings of the Very Large Database Endowment (PVLDB), 2(1):241–252, Aug. 2009.
    Google ScholarLocate open access versionFindings
  • R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the EuroSys Conference, Apr. 2012.
    Google ScholarLocate open access versionFindings
  • J. Cipar, Q. Ho, J. K. Kim, S. Lee, G. R. Ganger, G. Gibson, K. Keeton, and E. Xing. Solving the straggler problem with bounded staleness. In Proceedings of the 14th Workshop on Hot Topics in Operating Systems (HotOS), May 2013.
    Google ScholarLocate open access versionFindings
  • D. D. Clark. Window and acknowledgement strategy in TCP. RFC 813, July 1982.
    Google ScholarLocate open access versionFindings
  • N. Conway, W. R. Marczak, P. Alvaro, J. M. Hellerstein, and D. Maier. Logic and lattices for distributed programming. In Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC), Oct. 2012.
    Google ScholarLocate open access versionFindings
  • J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56(2):74–80, Feb. 2013.
    Google ScholarLocate open access versionFindings
  • J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2004.
    Google ScholarLocate open access versionFindings
  • J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: distributed graphparallel computation on natural graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Oct. 2012.
    Google ScholarLocate open access versionFindings
  • D. Hsu, N. Karampatziakis, J. Langford, and A. Smola. Parallel online learning. In R. Bekkerman, M. Bilenko, and J. Langford, editors, Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, Dec. 2011.
    Google ScholarLocate open access versionFindings
  • M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of the EuroSys Conference, Mar. 2007.
    Google ScholarLocate open access versionFindings
  • Q. Ke, M. Isard, and Y. Yu. Optimus: A dynamic rewriting framework for execution plans of dataparallel computation. In Proceedings of the EuroSys Conference, Apr. 2013.
    Google ScholarLocate open access versionFindings
  • E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The Click Modular Router. ACM Transactions on Computer Systems, 18(3):263– 297, Aug. 2000.
    Google ScholarLocate open access versionFindings
  • H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In Proceedings of the 19th International World Wide Web Conference (WWW), Apr. 2010.
    Google ScholarLocate open access versionFindings
  • J. Li, K. Tufte, V. Shkapenyuk, V. Papadimos, T. Johnson, and D. Maier. Out-of-order processing: a new architecture for high-performance stream systems. Proceedings of the Very Large Database Endowment (PVLDB), 1(1):274–288, Aug. 2008.
    Google ScholarLocate open access versionFindings
  • B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, J. M. Hellerstein, P. Maniatis, R. Ranakrishnan, T. Roscoe, and I. Stoica. Declarative networking: language, execution and optimization. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), June 2006.
    Google ScholarLocate open access versionFindings
  • B. T. Loo, T. Condie, J. M. Hellerstein, P. Maniatis, T. Roscoe, and I. Stoica. Implementing declarative overlays. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP), Oct. 2005.
    Google ScholarLocate open access versionFindings
  • B. T. Loo, J. M. Hellerstein, I. Stoica, and R. Ramakrishnan. Declarative routing: extensible routing with declarative queries. In Proceedings of the ACM International Conference on Applications, Technologies, Architectures and Protocols for Computer Communications (SIGCOMM), Aug. 2005.
    Google ScholarLocate open access versionFindings
  • Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), July 2010.
    Google ScholarLocate open access versionFindings
  • G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), June 2010.
    Google ScholarLocate open access versionFindings
  • F. McSherry, D. G. Murray, R. Isaacs, and M. Isard. Differential dataflow. In Proceedings of the 6th Conference on Innovative Data Systems Research (CIDR), Jan. 2013.
    Google ScholarLocate open access versionFindings
  • C. Mitchell, R. Power, and J. Li. Oolong: asynchronous distributed applications made easy. In Proceedings of the 3rd Asia-Pacific Workshop on Systems (APSys), July 2012.
    Google ScholarLocate open access versionFindings
  • D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. CIEL: a universal execution engine for distributed dataflow computing. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Mar. 2011.
    Google ScholarLocate open access versionFindings
  • D. Nagle, D. Serenyi, and A. Matthews. The Panasas ActiveScale storage cluster: Delivering scalable high bandwidth storage. In Proceedings of the ACM/IEEE Supercomputing Conference (SC), Nov. 2004.
    Google ScholarLocate open access versionFindings
  • J. Nagle. Congestion control in IP/TCP internetworks. RFC 896, Jan. 1984.
    Google ScholarLocate open access versionFindings
  • M. Najork. The scalable hyperlink store. In Proceedings of the 20th ACM Conference on Hypertext and Hypermedia, June 2009.
    Google ScholarLocate open access versionFindings
  • M. Najork, D. Fetterly, A. Halverson, K. Kenthapadi, and S. Gollapudi. Of hammers and nails: an empirical comparison of three paradigms for processing large graphs. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM), Feb. 2012.
    Google ScholarLocate open access versionFindings
  • J. Pelissier. Providing quality of service over InfiniBandTMArchitecture fabrics. In Proceedings of the 8th IEEE Symposium on High Performance Interconnects (HOT Interconnects), 2000.
    Google ScholarLocate open access versionFindings
  • D. Peng and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Oct. 2010.
    Google ScholarLocate open access versionFindings
  • D. P. Reed and R. K. Kanodia. Synchronization with eventcounts and sequencers. Communications of the ACM, 22(2):115–123, Feb. 1979.
    Google ScholarLocate open access versionFindings
  • P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering, 15(3), May/June 2002.
    Google ScholarLocate open access versionFindings
  • M. Welsh, D. Culler, and E. Brewer. SEDA: an architecture for well-conditioned, scalable internet services. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP), Oct. 2001.
    Google ScholarLocate open access versionFindings
  • R. Xin, J. Gonzalez, M. Franklin, and I. Stoica. GraphX: A resilient distributed graph system on spark. In Proceedings of the Graph Datamanagement Experiences and Systems (GRADES) Workshop, June 2013.
    Google ScholarLocate open access versionFindings
  • Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed dataparallel computing using a high-level language. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2008.
    Google ScholarLocate open access versionFindings
  • M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A faulttolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Apr. 2012.
    Google ScholarLocate open access versionFindings
  • M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized Streams: Fault-tolerant streaming computation at scale. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), Nov. 2013.
    Google ScholarLocate open access versionFindings
  • M. Zaharia, A. Konwinski, A. D.Joseph, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2008.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, Q. Gao, L. Gao, and C. Wang. PrIter: A distributed framework for prioritized iterative computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing (SoCC), Oct. 2011.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, Q. Gao, L. Gao, and C. Wang. Accelerate large-scale iterative computation through asynchronous accumulative updates. In Proceedings of the 3rd ACM Workshop on Scientific Cloud Computing (ScienceCloud), June 2012.
    Google ScholarLocate open access versionFindings
下载 PDF 全文
您的评分 :
0

 

最佳论文
2013年, 荣获SOSP的最佳论文奖
标签
评论