iBTune: individualized buffer tuning for large-scale cloud databases
Proceedings of the VLDB Endowment, pp. 1221-1234, 2019.
EI
Weibo:
Abstract:
Tuning the buffer size appropriately is critical to the performance of a cloud database, since memory is usually the resource bottleneck. For large-scale databases supporting heterogeneous applications, configuring the individual buffer sizes for a significant number of database instances presents a scalability challenge. Manual optimizat...More
Code:
Data:
Introduction
- Buffer pool is a critical resource for an OLTP database, serving as a data caching space to guarantee desirable system performance.
- Existing buffer pool configurations are almost unanimously based on database administrators (DBAs)’ experiences and often take a small and fixed number of recommended values.
- This manual process is neither efficient nor effective, and even not feasible for large cloud clusters, especially when the workload may dynamically change on individual database instances.
- Size 29609.98M 8.00M 200.00M 0.13M 8.00M 0.13M sort buffer 1.25M 0.00%
Highlights
- Buffer pool is a critical resource for an OLTP database, serving as a data caching space to guarantee desirable system performance
- We see that response time (RT) increases around 30% ∼ 50%, but the latency still keeps relatively low
- The performance (RT and query per second (QPS)) still meets the quality of service after we reduce the buffer size
- We propose iBTune to adjust DBMS buffer pool sizes by using a large deviation analysis for least recently used (LRU) caching models and leveraging the similar instances based on performance metrics to find tolerable miss ratios
- The deployment on our large-scale production environment shows that this solution can save more than 17% memory resource compared to the original system that only relies on experienced database administrators (DBAs)
- This paper focuses on shrinking buffer pool sizes to reduce cost, which by far is the most important issue with our production deployment
Results
- The authors first examine the online buffer pool adjustments in the production environment in Section 4.2.1.
- 4.2.1 Online adjustment of buffer pool sizes.
- The authors compare the performance before and after adjusting the buffer pool sizes, using the sizes computed by iBTune.
- The authors' algorithm adjusts the buffer pool size from 96GB to 86GB, about 10% reduction.
- Most RT after adjustment is under and close to the predicted upper bound of the response time
- This indicates that the algorithm predicts the upper bound of RT reasonably well.
- The performance (RT and QPS) still meets the quality of service after the authors reduce the buffer size
Conclusion
- The authors propose iBTune to adjust DBMS buffer pool sizes by using a large deviation analysis for LRU caching models and leveraging the similar instances based on performance metrics to find tolerable miss ratios.
- The deployment on the large-scale production environment shows that this solution can save more than 17% memory resource compared to the original system that only relies on experienced DBAs. Future work.
- This paper focuses on shrinking buffer pool sizes to reduce cost, which by far is the most important issue with the production deployment.
- The authors rely on DBAs to manually analyze the system expanding requirements before taking important actions.
- The authors will explore how to automatically expand the buffer pools in the future
Summary
Introduction:
Buffer pool is a critical resource for an OLTP database, serving as a data caching space to guarantee desirable system performance.- Existing buffer pool configurations are almost unanimously based on database administrators (DBAs)’ experiences and often take a small and fixed number of recommended values.
- This manual process is neither efficient nor effective, and even not feasible for large cloud clusters, especially when the workload may dynamically change on individual database instances.
- Size 29609.98M 8.00M 200.00M 0.13M 8.00M 0.13M sort buffer 1.25M 0.00%
Objectives:
OtterTune’s objective is to achieve a good performance of a single DBMS instance by tuning important parameters in the configuration file of a DBMS kernel while the goal is to optimize memory usage by tuning buffer pool sizes of many different database instances.Results:
The authors first examine the online buffer pool adjustments in the production environment in Section 4.2.1.- 4.2.1 Online adjustment of buffer pool sizes.
- The authors compare the performance before and after adjusting the buffer pool sizes, using the sizes computed by iBTune.
- The authors' algorithm adjusts the buffer pool size from 96GB to 86GB, about 10% reduction.
- Most RT after adjustment is under and close to the predicted upper bound of the response time
- This indicates that the algorithm predicts the upper bound of RT reasonably well.
- The performance (RT and QPS) still meets the quality of service after the authors reduce the buffer size
Conclusion:
The authors propose iBTune to adjust DBMS buffer pool sizes by using a large deviation analysis for LRU caching models and leveraging the similar instances based on performance metrics to find tolerable miss ratios.- The deployment on the large-scale production environment shows that this solution can save more than 17% memory resource compared to the original system that only relies on experienced DBAs. Future work.
- This paper focuses on shrinking buffer pool sizes to reduce cost, which by far is the most important issue with the production deployment.
- The authors rely on DBAs to manually analyze the system expanding requirements before taking important actions.
- The authors will explore how to automatically expand the buffer pools in the future
Tables
- Table1: Usage of different memory pools
- Table2: Average QPS from different business units
- Table3: Table 3: Machine configurations
- Table4: Average memory saving ratios for different sizes
- Table5: Online performance
- Table6: Training set performance (%)
- Table7: Testing set performance (%)
Related work
- Database parameter tuning has been an active area in recent years. Andy Pavlo et al proposed a framework [31] for self-driving DBMS including several key components, like runtime architecture, workload modeling and control framework. They extended the details of this framework to automatically tune DBMS knob configurations, called OtterTune [42]. OtterTune uses a LASSO algorithm to select the most impactful knobs and recommends knob settings based on Gaussian Processes. OtterTune uses many (more than hundreds) metrics of different configurations to train the model. OtterTune’s objective is to achieve a good performance of a single DBMS instance by tuning important parameters in the configuration file of a DBMS kernel while our goal is to optimize memory usage by tuning buffer pool sizes of many different database instances.
Funding
- The successful deployment on a production environment, which safely reduces the memory footprint by more than 17% compared to the original system that relies on manual configurations, demonstrates the effectiveness of our solution
- Since iBTune is deployed online, we have successfully reduced the memory consumption by more than 17% while still satisfying the required quality of service for our diverse business applications
- Compared with a model that only uses a single data point in the original data set, we improve the performance by utilizing all observations in similar environments, as demonstrated by real experiments in Section 4.2.2
- We see that RT increases around 30% ∼ 50%, but the latency still keeps relatively low (under 1ms)
- Most of the observed results (more than 83% for RT and 85% for MR) are consistent with the predictions
- The average of the hourly RT varies by more than 70% on the training and testing data sets due to workload change
- The deployment on our large-scale production environment shows that this solution can save more than 17% memory resource compared to the original system that only relies on experienced DBAs
Reference
- Docker. https://www.docker.com.
- M. Akdere, U. Cetintemel, M. Riondato, E. Upfal, and S. B. Zdonik. Learning-based query performance modeling and prediction. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, ICDE ’12, pages 390–401, Washington, DC, USA, 201IEEE Computer Society.
- M. Arlitt and L. W. C. ̇Internet web servers: Workload characterization and performance implications. In IEEE/ACM Transaction on Networking, October 1997.
- C. Berthet. Approximation of LRU caches miss rate: Application to power-law popularities. arXiv:1705.10738, 2017.
- L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: evidence and implications. In Proceedings of the 18th Conference on Information Communications, 1999.
- T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794. ACM, 2016.
- F. J. Corbato. A paging experiment with the multics system. MIT Project MAC Report, MAC-M-384, 1968.
- G. Dan and N. Carlsson. Power-law revisited: Large scale measurement study of p2p content popularity. In Proceedings of the 9th International Conference on Peer-to-peer Systems, IPTPS’10, pages 12–12, Berkeley, CA, USA, 2010. USENIX Association.
- S. Das, F. Li, V. R. Narasayya, and A. C. Konig. Automated demand-driven resource scaling in relational database-as-a-service. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pages 1923–1934, New York, NY, USA, 2016. ACM.
- K. G. Derpanis. Overview of the ransac algorithm. Image Rochester NY, 4(1):2–3, 2010.
- E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence, 35(11):2765–2781, 2013.
- C. Fricker, P. Robert, and J. Roberts. A versatile and accurate approximation for lru cache performance. In Proceedings of the 24th International Teletraffic Congress, page 8. International Teletraffic Congress, 2012.
- A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE ’09, pages 592–603, Washington, DC, USA, 2009. IEEE Computer Society.
- S. Garcia, J. Derrac, J. Cano, and F. Herrera. Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE transactions on pattern analysis and machine intelligence, 34(3):417–435, 2012.
- Y. Geng, S. Liu, Z. Yin, A. Naik, B. Prabhakar, M. Rosenblum, and A. Vahdat. Exploiting a natural network effect for scalable, fine-grained clock synchronization. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pages 81–94, Renton, WA, 2018. USENIX Association.
- P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine learning, 63(1):3–42, 2006.
- G. Huang, X. Cheng, J. Wang, Y. Wang, D. He, T. Zhang, F. Li, S. Wang, W. Cao, and Q. Li. X-engine: An optimized storage engine for large-scale e-commerce transaction processing. In Proceedings of the 2019 ACM International Conference on Management of Data, SIGMOD ’19. ACM, 2019.
- P. J. Huber. Robust estimation of a location parameter. Ann. Math. Statist., 35(1):73–101, 03 1964.
- P. R. Jelenkovic. Least-recently-used caching with Zipfs law requests. In The Sixth INFORMS Telecommunications Conference. Boca Raton, Florida, 2002.
- A. Kadiyala and A. Kumar. Applications of python to evaluate the performance of bagging methods. Environmental Progress & Sustainable Energy, 37(5):1555–1559, 2018.
- T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The case for learned index structures. In SIGMOD, pages 489–504, 2018.
- S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica. Learning to Optimize Join Queries With Deep Reinforcement Learning. ArXiv e-prints, Aug. 2018.
- L. Lamport. The part-time parliament. ACM Transactions on Computer Systems (TOCS), 16(2):133–169, 1998.
- D. Lee, J. Choi, J.-H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. On the existence of a spectrum of policies that subsumes the least recently used (lru) and least frequently used (lfu) policies. In Proceedings of the 1999 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’99, pages 134–143, New York, NY, USA, 1999. ACM.
- Z. L. Li, M. C.-J. Liang, W. He, L. Zhu, W. Dai, J. Jiang, and G. Sun. Metis: Robustly tuning tail latencies of cloud systems. In ATC (USENIX Annual Technical Conference). USENIX, July 2018.
- A. Liaw, M. Wiener, et al. Classification and regression by randomforest. R news, 2(3):18–22, 2002.
- L. Ma, D. Van Aken, A. Hefny, G. Mezerhane, A. Pavlo, and G. J. Gordon. Query-based workload forecasting for self-driving database management systems. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD ’18, pages 631–645, New York, NY, USA, 2018. ACM.
- V. Narasayya, I. Menache, M. Singh, F. Li, M. Syamala, and S. Chaudhuri. Sharing buffer pool memory in multi-tenant relational database-as-a-service. PVLDB, 8(7):726–737, 2015.
- D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting dbms. In 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 239–248, Sept 2005.
- E. J. O’neil, P. E. O’neil, and G. Weikum. The lru-k page replacement algorithm for database disk buffering. ACM SIGMOD Record, 22(2):297–306, 1993.
- A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. Mowry, M. Perron, I. Quah, S. Santurkar, A. Tomasic, S. Toor, D. V. Aken, Z. Wang, Y. Wu, R. Xian, and T. Zhang. Self-driving database management systems. In Proceedings of the 2017 Conference on Innovative Data Systems Research, CIDR ’17, 2017.
- J. Petrovic. Using Memcached for data distribution in industrial environment. In Proceeding ICONS ’08 Proceedings of the Third International Conference on Systems, pages 368–372, April 2008.
- S. Podlipnig and L. Boszormenyi. A survey of web cache replacement strategies. ACM Computing Surveys (CSUR), 35(4):374–398, Dec. 2003.
- L. Rokach and O. Z. Maimon. Data mining with decision trees: theory and applications, volume 69. World scientific, 2008.
- D. L. Shrestha and D. P. Solomatine. Experiments with adaboost. rt, an improved boosting scheme for regression. Neural computation, 18(7):1678–1710, 2006.
- Y. Smaragdakis, S. Kaplan, and P. Wilson. The eelru adaptive replacement algorithm. Perform. Eval., 53(2):93–123, July 2003.
- A. J. Storm, C. Garcia-Arellano, S. S. Lightstone, Y. Diao, and M. Surendra. Adaptive self-tuning memory in db2. In Proceedings of the 32Nd International Conference on Very Large Data Bases, VLDB ’06, pages 1081–1092. VLDB Endowment, 2006.
- T. Sugimoto and N. Miyoshi. On the asymptotics of fault probability in least-recently-used caching with Zipf-type request distribution. Random Structures & Algorithms, 29(3):296–323, 2006.
- R. Taft, N. El-Sayed, M. Serafini, Y. Lu, A. Aboulnaga, M. Stonebraker, R. Mayerhofer, and F. Andrade. P-store: An elastic database system with predictive provisioning. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD ’18, pages 205–219, New York, NY, USA, 2018. ACM.
- J. Tan, G. Quan, K. Ji, and N. Shroff. On resource pooling and separation for LRU caching. In Proceedings of the 2018 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science. ACM, 2018.
- D. N. Tran, P. C. Huynh, Y. C. Tay, and A. K. H. Tung. A new approach to dynamic self-tuning of database buffers. Trans. Storage, 4(1):3:1–3:25, May 2008.
- D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang. Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD ’17, pages 1009–1024, New York, NY, USA, 2017. ACM.
- J. Wang. A survey of web caching schemes for the internet. SIGCOMM Computer Communication Review, 29(5):36–46, Oct. 1999.
- W. Wu, Y. Chi, H. Hacıgumus, and J. F. Naughton. Towards predicting query execution time for concurrent and dynamic database workloads. PVLDB, 6(10):925–936, 2013.
- Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Characterizing facebook’s memcached workload. IEEE Internet Computing, 18(2):41–49, 2014.
- Y. Yang and J. Zhu. Write skew and zipf distribution: Evidence and implications. ACM Trans. Storage, 12(4):21:1–21:19, June 2016.
- J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 2061–2064. ACM, 2009.
- H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005.
Full Text
Tags
Comments