# Refined bounds for algorithm configuration: The knife-edge of dual class approximability

ICML, pp. 580-590, 2020.

EI

Weibo:

Abstract:

Automating algorithm configuration is growing increasingly necessary as algorithms come with more and more tunable parameters. It is common to tune parameters using machine learning, optimizing performance metrics such as runtime and solution quality. The training set consists of problem instances from the specific domain at hand. We in...More

Code:

Data:

Introduction

- Algorithms typically have tunable parameters that significantly impact their performance, measured in terms of runtime, solution quality, and so on.
- Machine learning is often used to automate parameter tuning [20, 21, 23, 42]: given a training set of problem instances from the application domain at hand, this automated procedure returns a parameter setting that will ideally perform well on future, unseen instances.
- Generalization bounds provide guidance when it comes to selecting the training set size
- They bound the difference between an algorithm’s performance on average over the training set and its expected performance on unseen instances.
- These bounds can be used to evaluate a parameter setting returned by any black-box procedure: they bound the difference between that parameter’s average performance on the training set and its expected performance

Highlights

- Algorithms typically have tunable parameters that significantly impact their performance, measured in terms of runtime, solution quality, and so on
- In our integer programming experiments, we show that this data-dependent generalization guarantee can be much tighter than the best-known worst-case guarantee
- We provided generalization guarantees for algorithm configuration, which bound the difference between a parameterized algorithm’s average empirical performance over a set of sample problem instances and its expected performance on future, unseen instances
- We showed that if this approximation holds under the L∞-norm, it is possible to provide strong generalization guarantees
- The approximation only holds under the Lp-norm for p < ∞, we showed that it is impossible in the worst-case to provide nontrivial bounds
- Via experiments in the context of integer programming algorithm configuration, we demonstrated that our bounds can be significantly stronger than the best-known worst-case guarantees [7], leading to a sample complexity improvement of 70,000%

Methods

- The authors analyze distributions over IPs formulating the combinatorial auction winner determination problem under the OR-bidding language [41], which the authors generate using the Combinatorial Auction Test Suite (CATS) [34].
- The authors use the “arbitrary” generator with 200 bids and 100 goods, resulting in IPs with 200 about variables, and the “regions” generator with 400 bids and.
- The authors use the algorithm described in Appendix D.1 of the paper by Balcan et al [7] to compute the functions fx∗.
- It overrides the default VSP of CPLEX 12.8.0.0 using the C API.

Results

- In Figure 2, the authors see that the bound significantly beats the worst-case bound up until the point there are approximately 100,000,000 training instances

Conclusion

- The authors provided generalization guarantees for algorithm configuration, which bound the difference between a parameterized algorithm’s average empirical performance over a set of sample problem instances and its expected performance on future, unseen instances.
- The approximation only holds under the Lp-norm for p < ∞, the authors showed that it is impossible in the worst-case to provide nontrivial bounds.
- Via experiments in the context of integer programming algorithm configuration, the authors demonstrated that the bounds can be significantly stronger than the best-known worst-case guarantees [7], leading to a sample complexity improvement of 70,000%

Summary

## Introduction:

Algorithms typically have tunable parameters that significantly impact their performance, measured in terms of runtime, solution quality, and so on.- Machine learning is often used to automate parameter tuning [20, 21, 23, 42]: given a training set of problem instances from the application domain at hand, this automated procedure returns a parameter setting that will ideally perform well on future, unseen instances.
- Generalization bounds provide guidance when it comes to selecting the training set size
- They bound the difference between an algorithm’s performance on average over the training set and its expected performance on unseen instances.
- These bounds can be used to evaluate a parameter setting returned by any black-box procedure: they bound the difference between that parameter’s average performance on the training set and its expected performance
## Methods:

The authors analyze distributions over IPs formulating the combinatorial auction winner determination problem under the OR-bidding language [41], which the authors generate using the Combinatorial Auction Test Suite (CATS) [34].- The authors use the “arbitrary” generator with 200 bids and 100 goods, resulting in IPs with 200 about variables, and the “regions” generator with 400 bids and.
- The authors use the algorithm described in Appendix D.1 of the paper by Balcan et al [7] to compute the functions fx∗.
- It overrides the default VSP of CPLEX 12.8.0.0 using the C API.
## Results:

In Figure 2, the authors see that the bound significantly beats the worst-case bound up until the point there are approximately 100,000,000 training instances## Conclusion:

The authors provided generalization guarantees for algorithm configuration, which bound the difference between a parameterized algorithm’s average empirical performance over a set of sample problem instances and its expected performance on future, unseen instances.- The approximation only holds under the Lp-norm for p < ∞, the authors showed that it is impossible in the worst-case to provide nontrivial bounds.
- Via experiments in the context of integer programming algorithm configuration, the authors demonstrated that the bounds can be significantly stronger than the best-known worst-case guarantees [7], leading to a sample complexity improvement of 70,000%

Funding

- We thank Kevin Leyton-Brown for a stimulating discussion that inspired us to pursue this research direction. This material is based on work supported by the National Science Foundation under grants CCF1535967, CCF-1733556, CCF-1910321, IIS-1617590, IIS-1618714, IIS-1718457, IIS-1901403, and SES-1919453; the ARO under awards W911NF-17-1-0082 and W911NF2010081; a fellowship from Carnegie Mellon Universitys Center for Machine Learning and Health; the Defense Advanced Research Projects Agency under cooperative agreement HR0011202000; an Amazon Research Award; an AWS Machine Learning Research Award; an Amazon Research Award; a Bloomberg Research Grant; and a Microsoft Research Faculty Fellowship

Reference

- Tobias Achterberg. SCIP: solving constraint integer programs. Mathematical Programming Computation, 1(1):1–41, 2009.
- Alejandro Marcos Alvarez, Quentin Louveaux, and Louis Wehenkel. A machine learning-based approximation of strong branching. INFORMS Journal on Computing, 29(1):185–195, 2017.
- Martin Anthony and Peter Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 2009.
- Patrick Assouad. Densite et dimension. Annales de l’Institut Fourier, 33(3):233–282, 1983.
- Amine Balafrej, Christian Bessiere, and Anastasia Paparrizou. Multi-armed bandits for adaptive constraint propagation. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2015.
- Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, and Colin White. Learningtheoretic foundations of algorithm configuration for combinatorial partitioning problems. Conference on Learning Theory (COLT), 2017.
- Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to branch. In International Conference on Machine Learning (ICML), 2018.
- Maria-Florina Balcan, Travis Dick, and Ellen Vitercik. Dispersion for data-driven algorithm design, online learning, and private optimization. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS), 2018.
- Maria-Florina Balcan, Travis Dick, and Colin White. Data-driven clustering via parameterized Lloyd’s families. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2018.
- Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, and Ellen Vitercik. How much data is sufficient to learn high-performing algorithms? arXiv preprint arXiv:1908.02894, 2019.
- Maria-Florina Balcan, Travis Dick, and Manuel Lang. Learning to link. Proceedings of the International Conference on Learning Representations (ICLR), 2020.
- Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Learning to optimize computational resources: Frugal training with generalization guarantees. AAAI Conference on Artificial Intelligence (AAAI), 2020.
- Evelyn Beale. Branch and bound methods for mathematical programming systems. Annals of Discrete Mathematics, 5:201–219, 1979.
- Michel Benichou, Jean-Michel Gauthier, Paul Girodet, Gerard Hentges, Gerard Ribiere, and O Vincent. Experiments in mixed-integer linear programming. Mathematical Programming, 1 (1):76–94, 1971.
- Giovanni Di Liberto, Serdar Kadioglu, Kevin Leo, and Yuri Malitsky. Dash: Dynamic approach for switching heuristics. European Journal of Operational Research, 248(3):943–953, 2016.
- J-M Gauthier and Gerard Ribiere. Experiments in mixed-integer linear programming using pseudo-costs. Mathematical Programming, 12(1):26–47, 1977.
- Rishi Gupta and Tim Roughgarden. A PAC approach to application-specific algorithm selection. SIAM Journal on Computing, 46(3):992–1017, 2017.
- David Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and computation, 100(1):78–150, 1992.
- He He, Hal Daume III, and Jason M Eisner. Learning to search in branch and bound algorithms. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2014.
- Eric Horvitz, Yongshao Ruan, Carla Gomez, Henry Kautz, Bart Selman, and Max Chickering. A Bayesian approach to tackling hard computational problems. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2001.
- Frank Hutter, Holger Hoos, Kevin Leyton-Brown, and Thomas Stutzle. ParamILS: An automatic algorithm configuration framework. Journal of Artificial Intelligence Research, 36(1): 267–306, 2009. ISSN 1076-9757.
- Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization (LION), pages 507–523, 2011.
- Serdar Kadioglu, Yuri Malitsky, Meinolf Sellmann, and Kevin Tierney. ISAC-instance-specific algorithm configuration. In Proceedings of the European Conference on Artificial Intelligence (ECAI), 2010.
- Jorg Hendrik Kappes, Markus Speth, Gerhard Reinelt, and Christoph Schnorr. Towards efficient and exact map-inference for large scale discrete computer vision problems via combinatorial optimization. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1752–1758. IEEE, 2013.
- Elias Boutros Khalil, Pierre Le Bodic, Le Song, George Nemhauser, and Bistra Dilkina. Learning to branch in mixed integer programming. In AAAI Conference on Artificial Intelligence (AAAI), 2016.
- Elias Boutros Khalil, Bistra Dilkina, George Nemhauser, Shabbir Ahmed, and Yufen Shao. Learning to run heuristics in tree search. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2017.
- Robert Kleinberg, Kevin Leyton-Brown, and Brendan Lucier. Efficiency through procrastination: Approximately optimal algorithm configuration with runtime guarantees. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2017.
- Robert Kleinberg, Kevin Leyton-Brown, Brendan Lucier, and Devon Graham. Procrastinating with confidence: Near-optimal, anytime, adaptive algorithm configuration. Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2019.
- Iasonas Kokkinos. Rapid deformable object detection using dual-tree branch-and-bound. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 2681–2689, 2011.
- Vladimir Koltchinskii. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 47(5):1902–1914, 2001.
- Nikos Komodakis, Nikos Paragios, and Georgios Tziritas. Clustering via LP-based stabilities. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2009.
- Markus Kruber, Marco E Lubbecke, and Axel Parmentier. Learning when to use a decomposition. In International Conference on AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, pages 202–210.
- Ailsa H Land and Alison G Doig. An automatic method of solving discrete programming problems. Econometrica, pages 497–520, 1960.
- Kevin Leyton-Brown, Mark Pearson, and Yoav Shoham. Towards a universal test suite for combinatorial auction algorithms. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), pages 66–76, Minneapolis, MN, 2000.
- Jeff Linderoth and Martin Savelsbergh. A computational study of search strategies for mixed integer programming. INFORMS Journal of Computing, 11(2):173–187, 1999.
- Andrea Lodi and Giulia Zarpellon. On learning and branching: a survey. TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, 25(2):207–236, 2017.
- Pascal Massart. Some applications of concentration inequalities to statistics. In Annales de la Faculte des sciences de Toulouse: Mathematiques, volume 9, pages 245–303, 2000.
- Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, 2012.
- George Nemhauser and Laurence Wolsey. Integer and Combinatorial Optimization. John Wiley & Sons, 1999.
- Ashish Sabharwal, Horst Samulowitz, and Chandra Reddy. Guiding combinatorial optimization with UCT. In International Conference on AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems. Springer, 2012.
- Tuomas Sandholm. Algorithm for optimal winner determination in combinatorial auctions. Artificial Intelligence, 135:1–54, January 2002.
- Tuomas Sandholm. Very-large-scale generalized combinatorial multi-attribute auctions: Lessons from conducting 60 billion of sourcing. In Zvika Neeman, Alvin Roth, and Nir Vulkan, editors, Handbook of Market Design. Oxford University Press, 2013.
- Karthik Sridharan. Learning from an optimization viewpoint. PhD thesis, Toyota Technological Institute at Chicago, 2012.
- Gellert Weisz, Andres Gyorgy, and Csaba Szepesvari. LeapsAndBounds: A method for approximately optimal algorithm configuration. In International Conference on Machine Learning (ICML), 2018.
- Gellert Weisz, Andres Gyorgy, and Csaba Szepesvari. CapsAndRuns: An improved method for approximately optimal algorithm configuration. International Conference on Machine Learning (ICML), 2019.
- 1. Theorem B.2 (e.g., Mohri et al. [38]). Let F ⊆ [0, 1]X be a set of functions mapping a domain X to [0, 1]. With probability at least 1−δ over the draw of N samples S = {(x1, y1),..., (xN, yN )} ∼ DN, the following holds for all f ∈ F:
- 1. Then for all functions u and w, uw 1 ≤ u p0 w p1.
- 1. Next, we evaluate fr0 (xi)
- 2. The class G is statistically learnable.
- 3. The class F is not γ-statistically learnable.

Full Text

Tags

Comments