AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
This paper undertook a detailed performance study of three workloads, and found that for those workloads, jobs are often bottlenecked on CPU and not I/O, network performance has little impact on job completion time, and many straggler causes can be identified and fixed

Making Sense of Performance in Data Analytics Frameworks.

NSDI, pp.293-307, (2015)

引用472|浏览295
EI
下载 PDF 全文
引用
微博一下

摘要

There has been much research devoted to improving the performance of data analytics frameworks, but comparatively little effort has been spent systematically identifying the performance bottlenecks of these systems. In this paper, we develop blocked time analysis, a methodology for quantifying performance bottlenecks in distributed comput...更多

代码

数据

0
简介
  • Large-scale data analytics frameworks such as Hadoop [13] and Spark [51] are in widespread use
  • As a result, both academia and industry have dedicated significant effort towards improving the performance of these frameworks.
  • Both academia and industry have dedicated significant effort towards improving the performance of these frameworks
  • Much of this performance work has been motivated by three widely-accepted mantras about the performance of data analytics: 1.
  • This has led to work on using the disk more efficiently [43] and caching data in memory [9, 30, 47, 51]
重点内容
  • Large-scale data analytics frameworks such as Hadoop [13] and Spark [51] are in widespread use
  • The first version operates on data stored in-memory using SparkSQL’s columnar cache and the second version operates on data stored on-disk using Hadoop Distributed File System (HDFS), which triply replicates data for fault-tolerance
  • To make sense of performance, we focus on blocked time analysis, which allows us to quantify how much more quickly a job would complete if tasks never blocked on the disk or the network (§3.3 explains why we cannot use blocked time analysis to understand CPU use)
  • We find that the median improvement from eliminating all time blocked on disk is at most 19% across all workloads, as shown in Figure 3
  • This paper undertook a detailed performance study of three workloads, and found that for those workloads, jobs are often bottlenecked on CPU and not I/O, network performance has little impact on job completion time, and many straggler causes can be identified and fixed
  • We dug into this data and found that a typical job that spends more than 75% of time in its shuf
  • The takeaway from this work should be the importance of instrumenting systems for blocked time analysis, so that researchers and practitioners alike can understand how best to focus performance improvements
方法
  • This section describes the workloads the authors ran, the blocked time analysis the authors used to understand performance, and the experimental setup. 2.1 Workloads

    The authors' analysis centers around fine-grained instrumentation of two benchmarks and one production workload running on Spark, summarized in Table 1.

    The big data benchmark (BDBench) [46] was developed to evaluate the differences between analytics frameworks and was derived from a benchmark developed by Pavlo et al [40].
  • The authors' analysis centers around fine-grained instrumentation of two benchmarks and one production workload running on Spark, summarized in Table 1.
  • The input dataset consists of HTML documents from the Common Crawl document corpus [2] combined with SQL summary tables generated using Intel’s Hadoop benchmark tool [50].
  • The first three queries have three variants that each use the same input data size but have different result sizes to reflect a spectrum between business-intelligence-like queries and ETL-like queries with large result sets that require many machines to store.
  • The first version operates on data stored in-memory using SparkSQL’s columnar cache and the second version operates on data stored on-disk using Hadoop Distributed File System (HDFS), which triply replicates data for fault-tolerance
结果
  • 75% of queries, the authors can identify the cause of more than 60% of stragglers. The authors found that job runtime cannot improve by more than 19% as a result of optimizing disk I/O.
  • Previous measurements looked at the fraction of jobs that spent a certain percent of time in shuffle; by this metric, 16% of jobs spent more than 75% of time shuffling data.
  • The authors dug into this data and found that a typical job that spends more than 75% of time in its shuf-.
  • The authors' instrumentation allows them to describe the cause of more than 60% of stragglers in 75% of the queries the authors ran
结论
  • This paper undertook a detailed performance study of three workloads, and found that for those workloads, jobs are often bottlenecked on CPU and not I/O, network performance has little impact on job completion time, and many straggler causes can be identified and fixed.
  • Obscuring performance factors sometimes seems like a necessary cost of implementing new and more complex optimizations, but inevitably makes understanding how to optimize performance in the future much more difficult
表格
  • Table1: Summary of workloads run. We study one larger workload in §6
  • Table2: Disk use for all MapReduce jobs run at Google
  • Table3: Disk use for a cluster with tens of thousands of machines, running Cosmos. Compute (years) describes the sum of runtimes across all tasks [<a class="ref-link" id="c12" href="#r12">12</a>]. The data includes jobs from two days of each month; see [<a class="ref-link" id="c11" href="#r11">11</a>] for details
  • Table4: The percent of jobs, task time, and bytes in jobs that spent different fractions of time in shuffle for the Facebook workload
Download tables as Excel
基金
  • Finally, we thank our shepherd, Venugopalan Ramasubramanian, for helping to shape the final version of the paper. This research is supported in part by a Hertz Foundation Fellowship, NSF CISE Expeditions Award CCF-1139158, LBNL Award 7076018, and DARPA XData Award FA8750-12-2-0331, and gifts from Amazon Web Services, Google, SAP, The Thomas and Stacey Siebel Foundation, Adatao, Adobe, Apple, Inc., Blue Goji, Bosch, C3Energy, Cisco, Cray, Cloudera, EMC, Ericsson, Facebook, Guavus, Huawei, Informatica, Intel, Microsoft, NetApp, Pivotal, Samsung, Splunk, Virdata and VMware
引用论文
  • Apache Parquet. http://parquet.incubator.apache.org/.
    Findings
  • Common Crawl. http://commoncrawl.org/.
    Findings
  • Databricks. http://databricks.com/.
    Findings
  • Spark SQL. https://spark.apache.org/sql/.
    Findings
  • M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen. Performance Debugging for Distributed Systems of Black Boxes. In Proc. SOSP, 2003.
    Google ScholarLocate open access versionFindings
  • M. Al-fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic Flow Scheduling for Data Center Networks. In Proc. NSDI, 2010.
    Google ScholarLocate open access versionFindings
  • G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, and E. Harris. Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters. In Proc. EuroSys, 2011.
    Google ScholarLocate open access versionFindings
  • G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Effective Straggler Mitigation: Attack of the Clones. In Proc. NSDI, 2013.
    Google ScholarLocate open access versionFindings
  • G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In Proc. NSDI, 2012.
    Google ScholarLocate open access versionFindings
  • G. Ananthanarayanan, M. C.-C. Hung, X. Ren, I. Stoica, A. Wierman, and M. Yu. GRASS: Trimming Stragglers in Approximation Analytics. In Proc. NSDI, 2014.
    Google ScholarLocate open access versionFindings
  • G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the Outliers in Map-Reduce Clusters using Mantri. In Proc. OSDI, 2010.
    Google ScholarLocate open access versionFindings
  • G. Ananthanarayayan. Personal Communication, February 2015.
    Google ScholarFindings
  • Apache Software Foundation. Apache Hadoop. http://hadoop.apache.org/.
    Findings
  • H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron. Towards Predictable Datacenter Networks. In Proc. SIGCOMM, 2011.
    Google ScholarLocate open access versionFindings
  • P. Barham, A. Donnelly, R. Isaacs, and R. Mortier. Using magpie for request extraction and workload modelling. In Proc. SOSP, 2004.
    Google ScholarLocate open access versionFindings
  • [17] M. Chowdhury, S. Kandula, and I. Stoica. Leveraging Endpoint Flexibility in Data-intensive Clusters. In Proc. SIGCOMM, 2013.
    Google ScholarLocate open access versionFindings
  • [18] M. Chowdhury and I. Stoica. Coflow: A Networking Abstraction for Cluster Applications. In Proc. HotNets, 2012.
    Google ScholarLocate open access versionFindings
  • [19] M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica. Managing Data Transfers in Computer Clusters with Orchestra. In Proc. SIGCOMM, 2011.
    Google ScholarLocate open access versionFindings
  • [20] M. Chowdhury, Y. Zhong, and I. Stoica. Efficient Coflow Scheduling with Varys. In Proc. SIGCOMM, 2014.
    Google ScholarLocate open access versionFindings
  • [21] P. Costa, A. Donnelly, A. Rowstron, and G. O’Shea. Camdoop: Exploiting In-network Aggregation for Big Data Applications. In Proc. NSDI, 2012.
    Google ScholarLocate open access versionFindings
  • [22] A. Crotty, A. Galakatos, K. Dursun, T. Kraska, U. Cetintemel, and S. B. Zdonik. Tupleware: Redefining modern analytics. CoRR, 2014.
    Google ScholarFindings
  • [23] J. Dean. Personal Communication, February 2015.
    Google ScholarLocate open access versionFindings
  • [24] J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. CACM, 51(1):107–113, Jan. 2008.
    Google ScholarLocate open access versionFindings
  • [25] J. Erickson, M. Kornacker, and D. Kumar. New SQL Choices in the Apache Hadoop Ecosystem: Why Impala Continues to Lead. http://goo.gl/evDBfy, 2014.
    Findings
  • [26] B. Gufler, N. Augsten, A. Reiser, and A. Kemper. Load Balancing in MapReduce Based on Scalable Cardinality Estimates. In Proc. ICDE, pages 522–533, 2012.
    Google ScholarLocate open access versionFindings
  • [27] Z. Guo, X. Fan, R. Chen, J. Zhang, H. Zhou, S. McDirmid, C. Liu, W. Lin, J. Zhou, and L. Zhou. Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE. In Proc. OSDI, 2012.
    Google ScholarLocate open access versionFindings
  • [28] V. Jeyakumar, M. Alizadeh, D. Mazieres, B. Prabhakar, C. Kim, and A. Greenberg. EyeQ: Practical Network Performance Isolation at the Edge. In Proc. NSDI, 2013.
    Google ScholarLocate open access versionFindings
  • [29] Y. Kwon, M. Balazinska, B. Howe, and J. Rolia. SkewTune: Mitigating Skew in MapReduce Applications. In Proc. SIGMOD, pages 25–36, 2012.
    Google ScholarLocate open access versionFindings
  • [30] H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Reliable, Memory Speed Storage for Cluster Computing Frameworks. In Proc. SoCC, 2014. http://goo.gl/INds4t, August 2013.
    Locate open access versionFindings
  • [32] Oracle. The Java HotSpot Performance Engine Architecture. http://www.oracle.com/technetwork/java/whitepaper-135217.html.
    Findings
  • [33] K. Ousterhout. Display filesystem read statistics with each task. https://issues.apache.org/jira/browse/SPARK-1683.
    Findings
  • [34] K. Ousterhout. Shuffle read bytes are reported incorrectly for stages with multiple shuffle dependencies. https://issues.apache.org/jira/browse/SPARK-2571.
    Findings
  • [35] K. Ousterhout. Shuffle write time does not include time to open shuffle files. https://issues.apache.org/jira/browse/SPARK-3570.
    Findings
  • [36] K. Ousterhout. Shuffle write time is incorrect for sort-based shuffle. https://issues.apache.org/jira/browse/SPARK-5762.
    Findings
  • [38] K. Ousterhout. Time to cleanup spilled shuffle files not included in shuffle write time. https://issues.apache.org/jira/browse/ SPARK-5845.
    Findings
  • [39] K. Ousterhout, A. Panda, J. Rosen, S. Venkataraman, R. Xin, S. Ratnasamy, S. Shenker, and I. Stoica. The Case for Tiny Tasks in Compute Clusters. In Proc. HotOS, 2013.
    Google ScholarLocate open access versionFindings
  • [40] A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A Comparison of Approaches to Large-scale Data Analysis. In Proc. SIGMOD, 2009.
    Google ScholarLocate open access versionFindings
  • [41] L. Popa, G. Kumar, M. Chowdhury, A. Krishnamurthy, S. Ratnasamy, and I. Stoica. FairCloud: Sharing The Network in Cloud Computing. In Proc. SIGCOMM, 2012.
    Google ScholarLocate open access versionFindings
  • [42] P. Prakash, A. Dixit, Y. C. Hu, and R. Kompella. The TCP Outcast Problem: Exposing Unfairness in Data Center Networks. In Proc. NSDI, 2012.
    Google ScholarLocate open access versionFindings
  • [43] A. Rasmussen, V. T. Lam, M. Conley, G. Porter, R. Kapoor, and A. Vahdat. Themis: An I/O-efficient MapReduce. In Proc. SoCC, 2012.
    Google ScholarLocate open access versionFindings
  • [44] K. Sakellis. Track local bytes read for shuffles - update UI. https://issues.apache.org/jira/browse/SPARK-5645.
    Findings
  • [45] Transaction Processing Performance Council (TPC). TPC Benchmark DS Standard Specification. http://www.tpc.org/tpcds/spec/tpcds_1.1.0.pdf, 2012.
    Findings
  • [46] UC Berkeley AmpLab. Big Data Benchmark. https://amplab.cs.berkeley.edu/benchmark/, February 2014.
    Findings
  • [47] A. Wang and C. McCabe. In-memory Caching in HDFS: Lower Latency, Same Great Taste. In Presented at Hadoop Summit, 2014.
    Google ScholarFindings
  • [48] D. Xie, N. Ding, Y. C. Hu, and R. Kompella. The Only Constant is Change: Incorporating TimeVarying Network Reservations in Data Centers. In Proc. SIGCOMM, 2012.
    Google ScholarLocate open access versionFindings
  • [49] N. J. Yadwadkar, G. Ananthanarayanan, and R. Katz. Wrangler: Predictable and Faster Jobs Using Fewer Resources. In Proc. SoCC, 2014.
    Google ScholarLocate open access versionFindings
  • [50] L. Yi, K. Wei, S. Huang, and J. Dai. Hadoop //github.com/intel-hadoop/HiBench, 2012.
    Google ScholarLocate open access versionFindings
  • [51] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proc. NSDI, 2012.
    Google ScholarLocate open access versionFindings
  • [52] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce Performance in Heterogeneous Environments. In Proc. OSDI, 2008.
    Google ScholarLocate open access versionFindings
  • [53] J. Zhang, H. Zhou, R. Chen, X. Fan, Z. Guo, H. Lin, J. Y. Li, W. Lin, J. Zhou, and L. Zhou. Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions. In Proc. NSDI, 2012.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn