We propose using data generator, the data generator used in the Linked Data Benchmark Council Social Network Benchmark
Graphalytics: A Big Data Benchmark for Graph-Processing Platforms
Graphs are increasingly used in industry, governance, and science. This has stimulated the appearance of many and diverse graph-processing platforms. Although platform diversity is beneficial, it also makes it very challenging to select the best platform for an application domain or one of its important applications, and to design new and...更多
下载 PDF 全文
- Generic big data processing platforms, such as Hadoop, can process graphs, but are generally slow for challenging graph-processing algorithms [3, 4] or graph datasets [4, 7].
- Several studies have compared the performance of graph processing platforms [3, 4, 7] using multiple algorithms and/or datasets, but the de facto benchmarking standard is currently Graph500, which is limited to a single algorithm applied to a synthetic graph model.
- The authors present the vision for Graphalytics, a big data
- Graph data is increasingly used in industry, governance, and science
- We propose using data generator , the data generator used in the Linked Data Benchmark Council Social Network Benchmark
- The Report Generator produces the main outcome of Graphalytics, a detailed report on the performance of the SUT during the benchmark, which includes all relevant configuration information
- Graphalytics has a database for Datasets, which includes preconfigured graphs ready to be used with Graphalytics
- Graphalytics focuses on diverse datasets and algorithms, and methodologically it greatly extends the shortcomings of related work
- Novel from previous work, including our own, Graphalytics focuses on a fundamental understanding of choke points, extensions to the dataset generation, and an advanced benchmarking harness that will evolve into a public database of useful results
- System Monitor
Platform-specific algorithm implementation
Graph processing platform each supported platform.
- Graphalytics is still in an early phase of development, it has already enabled them to enrich the previous graph benchmarking results with new datasets and platforms.
- The single machine is faster than the cluster for smaller graphs, were computation is mostly CPU bound.
- It can generate a 1.3 billion edges graph in about 3 hours.
- The authors note that GraphX is significantly slower that Giraph for the CONN algorithm (∼ 3×), al-
- Benchmarking graph-processing platforms enables system comparison, tuning, anddesign for increasingly more domains.
- Responding to a dearth of comprehensive benchmarking approaches for graph-processing platforms, in this work the authors have proposed the vision: Graphalytics.
- Novel from previous work, including the own, Graphalytics focuses on a fundamental understanding of choke points, extensions to the dataset generation, and an advanced benchmarking harness that will evolve into a public database of useful results.
- Graphalytics aims to become an accepted benchmarking standard by both the LDBC and the SPEC Research Group communities, and attract further implementations from the creators of graph-processing platforms themselves.
- Table1: Characteristics of real graphs
- We have already compared, throughout this work, the Graphalytics benchmark with other benchmarks proposed for graph-processing [7, 13, 22]. In summary, Graphalytics is much more comprehensive and ambitious than previous work: it supports more diverse and realistic datasets [4, 16], more diverse and realistic algorithms , and reference implementations for more platforms (preliminary results obtained for 10 platforms [4, 5]). Moreover, Graphalytics includes in its vision a fundamental understanding of choke points, extensions to the dataset generation, and an advanced benchmarking harness that will evolve into a public databased of useful results.
- This research was supported by Oracle Labs, LDBC (ldbcouncil.org, originally funded by EU project FP7-317548), Dutch NWO KIEM project KIESA, COMMIT project COMMIssioner, Ministry of Science and Innovation of Spain (TIN2013-47008-R), and Generalitat de Catalunya (SGR2014-890)
- M. Dayarathna and T. Suzumura. Graph database benchmarking on cloud environments with XGDBench. Autom. Softw. Eng., 21(4):509–533, 2014.
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2004.
- B. Elser and A. Montresor. An evaluation study of BigData frameworks for graph processing. In IEEE International Conference on Big Data, pages 60–67. IEEE, Oct. 2013.
- Y. Guo, M. Biczak, A. L. Varbanescu, A. Iosup, C. Martella, and T. L. Willke. How Well Do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis. In IEEE IPDPS, pages 395–40IEEE, May 2014.
- Y. Guo, A. L. Varbanescu, A. Iosup, and D. H. J. Epema. An Empirical Performance Evaluation of GPU-Enabled Graph-Processing Systems. In CCGRID, pages 927–932, 201(in print, available online: http://www.pds.ewi.tudelft.nl/~iosup/perf-eval-gpu-graph-processing15ccgrid.pdf).
- Y. Guo, A. L. Varbanescu, A. Iosup, C. Martella, and T. L. Willke. Benchmarking Graph-Processing Platforms: A Vision. In ACM/SPEC International Conference on Performance Engineering (ICPE), pages 289–292. ACM Press, 2014.
- M. Han, K. Daudjee, K. Ammar, M. T. Ozsu, X. Wang, and T. Jin. An Experimental Comparison of Pregel-like Graph Processing Systems. In VLDB, 2014.
- C. Herrera and P. J. Zufiria. Generating scale-free networks with adjustable clustering coefficient via random walks. arXiv preprint arXiv:1105.3347, 2011.
- A. Iosup, A. L. Varbanescu, M. Capota, T. Hegeman, Y. Guo, W. L. Ngai, and M. Verstraaten. Towards Benchmarking IaaS and PaaS Clouds for Graph Analytics. In Workshop on Big Data Benchmarking (WBDB), Potsdam, Germany, 2014.
- J. Leskovec, D. Chakrabarti, J. M. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker graphs: An approach to modeling networks. J Mach Learn Res, 11:985–1042, 2010.
- J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. In ACM SIGKDD, 2005.
- I. X. Y. Leung, P. Hui, P. Lio, and J. Crowcroft. Towards real-time community detection in large networks. Phys. Rev. E, 79:066107, Jun 2009.
- Y. Lu, J. Cheng, D. Yan, and H. Wu. Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation. In VLDB, 2014.
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-Scale Graph Processing. In ACM International Conference on management of data (SIGMOD), page 135. ACM Press, 2010.
- M. Pham, P. A. Boncz, and O. Erling. S3G2: A scalable structure-correlated social graph generator. In TPCTC, 2012.
- A. Prat-Perez and A. Averbuch. Benchmark design for navigational pattern matching benchmarking. Deliverable 3.3.34, LDBC, October 2014. [Online] Available: http://ldbc.eu/sites/default/files/LDBC_D3.3.34.pdf.
- A. Prat-Perez and D. Dominguez-Sal. How community-like is the structure of synthetically generated graphs? In GRADES, pages 7:1–7:9. ACM, 2014.
- A. Prat-Perez, D. Dominguez-Sal, and J. Larriba-Pey. Social based layouts for the increase of locality in graph operations. In International Conference on Database Systems for Advanced Applications (DASFAA), 2011.
- J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The anatomy of the Facebook social graph. arXiv preprint arXiv:1111.4503, 2011.
- E. Volz. Random networks with tunable degree distribution and clustering. Physical Review E, 70(5):056115, 2004.
- R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. GraphX: A Resilient Distributed Graph System on Spark. In GRADES, pages 1–6. ACM Press, 2013.
- Y. Zhao, K. Yoshigoe, M. Xie, and S. Zhou. Evaluation and Analysis of Distributed Graph-Parallel Processing Frameworks. Journal of Cyber Security and Mobility, 3:289–316, 2014.