Refined bounds for algorithm configuration: The knife-edge of dual class approximability

ICML, pp. 580-590, 2020.

Cited by: 0|Bibtex|Views14
EI
Other Links: arxiv.org|academic.microsoft.com|dblp.uni-trier.de
Weibo:
We provided generalization guarantees for algorithm configuration, which bound the difference between a parameterized algorithm’s average empirical performance over a set of sample problem instances and its expected performance on future, unseen instances

Abstract:

Automating algorithm configuration is growing increasingly necessary as algorithms come with more and more tunable parameters. It is common to tune parameters using machine learning, optimizing performance metrics such as runtime and solution quality. The training set consists of problem instances from the specific domain at hand. We in...More

Code:

Data:

0
Introduction
  • Algorithms typically have tunable parameters that significantly impact their performance, measured in terms of runtime, solution quality, and so on.
  • Machine learning is often used to automate parameter tuning [20, 21, 23, 42]: given a training set of problem instances from the application domain at hand, this automated procedure returns a parameter setting that will ideally perform well on future, unseen instances.
  • Generalization bounds provide guidance when it comes to selecting the training set size
  • They bound the difference between an algorithm’s performance on average over the training set and its expected performance on unseen instances.
  • These bounds can be used to evaluate a parameter setting returned by any black-box procedure: they bound the difference between that parameter’s average performance on the training set and its expected performance
Highlights
  • Algorithms typically have tunable parameters that significantly impact their performance, measured in terms of runtime, solution quality, and so on
  • In our integer programming experiments, we show that this data-dependent generalization guarantee can be much tighter than the best-known worst-case guarantee
  • We provided generalization guarantees for algorithm configuration, which bound the difference between a parameterized algorithm’s average empirical performance over a set of sample problem instances and its expected performance on future, unseen instances
  • We showed that if this approximation holds under the L∞-norm, it is possible to provide strong generalization guarantees
  • The approximation only holds under the Lp-norm for p < ∞, we showed that it is impossible in the worst-case to provide nontrivial bounds
  • Via experiments in the context of integer programming algorithm configuration, we demonstrated that our bounds can be significantly stronger than the best-known worst-case guarantees [7], leading to a sample complexity improvement of 70,000%
Methods
  • The authors analyze distributions over IPs formulating the combinatorial auction winner determination problem under the OR-bidding language [41], which the authors generate using the Combinatorial Auction Test Suite (CATS) [34].
  • The authors use the “arbitrary” generator with 200 bids and 100 goods, resulting in IPs with 200 about variables, and the “regions” generator with 400 bids and.
  • The authors use the algorithm described in Appendix D.1 of the paper by Balcan et al [7] to compute the functions fx∗.
  • It overrides the default VSP of CPLEX 12.8.0.0 using the C API.
Results
  • In Figure 2, the authors see that the bound significantly beats the worst-case bound up until the point there are approximately 100,000,000 training instances
Conclusion
  • The authors provided generalization guarantees for algorithm configuration, which bound the difference between a parameterized algorithm’s average empirical performance over a set of sample problem instances and its expected performance on future, unseen instances.
  • The approximation only holds under the Lp-norm for p < ∞, the authors showed that it is impossible in the worst-case to provide nontrivial bounds.
  • Via experiments in the context of integer programming algorithm configuration, the authors demonstrated that the bounds can be significantly stronger than the best-known worst-case guarantees [7], leading to a sample complexity improvement of 70,000%
Summary
  • Introduction:

    Algorithms typically have tunable parameters that significantly impact their performance, measured in terms of runtime, solution quality, and so on.
  • Machine learning is often used to automate parameter tuning [20, 21, 23, 42]: given a training set of problem instances from the application domain at hand, this automated procedure returns a parameter setting that will ideally perform well on future, unseen instances.
  • Generalization bounds provide guidance when it comes to selecting the training set size
  • They bound the difference between an algorithm’s performance on average over the training set and its expected performance on unseen instances.
  • These bounds can be used to evaluate a parameter setting returned by any black-box procedure: they bound the difference between that parameter’s average performance on the training set and its expected performance
  • Methods:

    The authors analyze distributions over IPs formulating the combinatorial auction winner determination problem under the OR-bidding language [41], which the authors generate using the Combinatorial Auction Test Suite (CATS) [34].
  • The authors use the “arbitrary” generator with 200 bids and 100 goods, resulting in IPs with 200 about variables, and the “regions” generator with 400 bids and.
  • The authors use the algorithm described in Appendix D.1 of the paper by Balcan et al [7] to compute the functions fx∗.
  • It overrides the default VSP of CPLEX 12.8.0.0 using the C API.
  • Results:

    In Figure 2, the authors see that the bound significantly beats the worst-case bound up until the point there are approximately 100,000,000 training instances
  • Conclusion:

    The authors provided generalization guarantees for algorithm configuration, which bound the difference between a parameterized algorithm’s average empirical performance over a set of sample problem instances and its expected performance on future, unseen instances.
  • The approximation only holds under the Lp-norm for p < ∞, the authors showed that it is impossible in the worst-case to provide nontrivial bounds.
  • Via experiments in the context of integer programming algorithm configuration, the authors demonstrated that the bounds can be significantly stronger than the best-known worst-case guarantees [7], leading to a sample complexity improvement of 70,000%
Funding
  • We thank Kevin Leyton-Brown for a stimulating discussion that inspired us to pursue this research direction. This material is based on work supported by the National Science Foundation under grants CCF1535967, CCF-1733556, CCF-1910321, IIS-1617590, IIS-1618714, IIS-1718457, IIS-1901403, and SES-1919453; the ARO under awards W911NF-17-1-0082 and W911NF2010081; a fellowship from Carnegie Mellon Universitys Center for Machine Learning and Health; the Defense Advanced Research Projects Agency under cooperative agreement HR0011202000; an Amazon Research Award; an AWS Machine Learning Research Award; an Amazon Research Award; a Bloomberg Research Grant; and a Microsoft Research Faculty Fellowship
Reference
  • Tobias Achterberg. SCIP: solving constraint integer programs. Mathematical Programming Computation, 1(1):1–41, 2009.
    Google ScholarLocate open access versionFindings
  • Alejandro Marcos Alvarez, Quentin Louveaux, and Louis Wehenkel. A machine learning-based approximation of strong branching. INFORMS Journal on Computing, 29(1):185–195, 2017.
    Google ScholarLocate open access versionFindings
  • Martin Anthony and Peter Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 2009.
    Google ScholarFindings
  • Patrick Assouad. Densite et dimension. Annales de l’Institut Fourier, 33(3):233–282, 1983.
    Google ScholarLocate open access versionFindings
  • Amine Balafrej, Christian Bessiere, and Anastasia Paparrizou. Multi-armed bandits for adaptive constraint propagation. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2015.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, and Colin White. Learningtheoretic foundations of algorithm configuration for combinatorial partitioning problems. Conference on Learning Theory (COLT), 2017.
    Google ScholarFindings
  • Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to branch. In International Conference on Machine Learning (ICML), 2018.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan, Travis Dick, and Ellen Vitercik. Dispersion for data-driven algorithm design, online learning, and private optimization. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS), 2018.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan, Travis Dick, and Colin White. Data-driven clustering via parameterized Lloyd’s families. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, and Ellen Vitercik. How much data is sufficient to learn high-performing algorithms? arXiv preprint arXiv:1908.02894, 2019.
    Findings
  • Maria-Florina Balcan, Travis Dick, and Manuel Lang. Learning to link. Proceedings of the International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Learning to optimize computational resources: Frugal training with generalization guarantees. AAAI Conference on Artificial Intelligence (AAAI), 2020.
    Google ScholarLocate open access versionFindings
  • Evelyn Beale. Branch and bound methods for mathematical programming systems. Annals of Discrete Mathematics, 5:201–219, 1979.
    Google ScholarLocate open access versionFindings
  • Michel Benichou, Jean-Michel Gauthier, Paul Girodet, Gerard Hentges, Gerard Ribiere, and O Vincent. Experiments in mixed-integer linear programming. Mathematical Programming, 1 (1):76–94, 1971.
    Google ScholarLocate open access versionFindings
  • Giovanni Di Liberto, Serdar Kadioglu, Kevin Leo, and Yuri Malitsky. Dash: Dynamic approach for switching heuristics. European Journal of Operational Research, 248(3):943–953, 2016.
    Google ScholarLocate open access versionFindings
  • J-M Gauthier and Gerard Ribiere. Experiments in mixed-integer linear programming using pseudo-costs. Mathematical Programming, 12(1):26–47, 1977.
    Google ScholarLocate open access versionFindings
  • Rishi Gupta and Tim Roughgarden. A PAC approach to application-specific algorithm selection. SIAM Journal on Computing, 46(3):992–1017, 2017.
    Google ScholarLocate open access versionFindings
  • David Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and computation, 100(1):78–150, 1992.
    Google ScholarLocate open access versionFindings
  • He He, Hal Daume III, and Jason M Eisner. Learning to search in branch and bound algorithms. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2014.
    Google ScholarLocate open access versionFindings
  • Eric Horvitz, Yongshao Ruan, Carla Gomez, Henry Kautz, Bart Selman, and Max Chickering. A Bayesian approach to tackling hard computational problems. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2001.
    Google ScholarLocate open access versionFindings
  • Frank Hutter, Holger Hoos, Kevin Leyton-Brown, and Thomas Stutzle. ParamILS: An automatic algorithm configuration framework. Journal of Artificial Intelligence Research, 36(1): 267–306, 2009. ISSN 1076-9757.
    Google ScholarLocate open access versionFindings
  • Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization (LION), pages 507–523, 2011.
    Google ScholarLocate open access versionFindings
  • Serdar Kadioglu, Yuri Malitsky, Meinolf Sellmann, and Kevin Tierney. ISAC-instance-specific algorithm configuration. In Proceedings of the European Conference on Artificial Intelligence (ECAI), 2010.
    Google ScholarLocate open access versionFindings
  • Jorg Hendrik Kappes, Markus Speth, Gerhard Reinelt, and Christoph Schnorr. Towards efficient and exact map-inference for large scale discrete computer vision problems via combinatorial optimization. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1752–1758. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • Elias Boutros Khalil, Pierre Le Bodic, Le Song, George Nemhauser, and Bistra Dilkina. Learning to branch in mixed integer programming. In AAAI Conference on Artificial Intelligence (AAAI), 2016.
    Google ScholarLocate open access versionFindings
  • Elias Boutros Khalil, Bistra Dilkina, George Nemhauser, Shabbir Ahmed, and Yufen Shao. Learning to run heuristics in tree search. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2017.
    Google ScholarLocate open access versionFindings
  • Robert Kleinberg, Kevin Leyton-Brown, and Brendan Lucier. Efficiency through procrastination: Approximately optimal algorithm configuration with runtime guarantees. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2017.
    Google ScholarLocate open access versionFindings
  • Robert Kleinberg, Kevin Leyton-Brown, Brendan Lucier, and Devon Graham. Procrastinating with confidence: Near-optimal, anytime, adaptive algorithm configuration. Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2019.
    Google ScholarLocate open access versionFindings
  • Iasonas Kokkinos. Rapid deformable object detection using dual-tree branch-and-bound. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 2681–2689, 2011.
    Google ScholarLocate open access versionFindings
  • Vladimir Koltchinskii. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 47(5):1902–1914, 2001.
    Google ScholarLocate open access versionFindings
  • Nikos Komodakis, Nikos Paragios, and Georgios Tziritas. Clustering via LP-based stabilities. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2009.
    Google ScholarLocate open access versionFindings
  • Markus Kruber, Marco E Lubbecke, and Axel Parmentier. Learning when to use a decomposition. In International Conference on AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, pages 202–210.
    Google ScholarLocate open access versionFindings
  • Ailsa H Land and Alison G Doig. An automatic method of solving discrete programming problems. Econometrica, pages 497–520, 1960.
    Google ScholarLocate open access versionFindings
  • Kevin Leyton-Brown, Mark Pearson, and Yoav Shoham. Towards a universal test suite for combinatorial auction algorithms. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), pages 66–76, Minneapolis, MN, 2000.
    Google ScholarLocate open access versionFindings
  • Jeff Linderoth and Martin Savelsbergh. A computational study of search strategies for mixed integer programming. INFORMS Journal of Computing, 11(2):173–187, 1999.
    Google ScholarLocate open access versionFindings
  • Andrea Lodi and Giulia Zarpellon. On learning and branching: a survey. TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, 25(2):207–236, 2017.
    Google ScholarLocate open access versionFindings
  • Pascal Massart. Some applications of concentration inequalities to statistics. In Annales de la Faculte des sciences de Toulouse: Mathematiques, volume 9, pages 245–303, 2000.
    Google ScholarLocate open access versionFindings
  • Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, 2012.
    Google ScholarFindings
  • George Nemhauser and Laurence Wolsey. Integer and Combinatorial Optimization. John Wiley & Sons, 1999.
    Google ScholarFindings
  • Ashish Sabharwal, Horst Samulowitz, and Chandra Reddy. Guiding combinatorial optimization with UCT. In International Conference on AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems. Springer, 2012.
    Google ScholarLocate open access versionFindings
  • Tuomas Sandholm. Algorithm for optimal winner determination in combinatorial auctions. Artificial Intelligence, 135:1–54, January 2002.
    Google ScholarLocate open access versionFindings
  • Tuomas Sandholm. Very-large-scale generalized combinatorial multi-attribute auctions: Lessons from conducting 60 billion of sourcing. In Zvika Neeman, Alvin Roth, and Nir Vulkan, editors, Handbook of Market Design. Oxford University Press, 2013.
    Google ScholarLocate open access versionFindings
  • Karthik Sridharan. Learning from an optimization viewpoint. PhD thesis, Toyota Technological Institute at Chicago, 2012.
    Google ScholarFindings
  • Gellert Weisz, Andres Gyorgy, and Csaba Szepesvari. LeapsAndBounds: A method for approximately optimal algorithm configuration. In International Conference on Machine Learning (ICML), 2018.
    Google ScholarLocate open access versionFindings
  • Gellert Weisz, Andres Gyorgy, and Csaba Szepesvari. CapsAndRuns: An improved method for approximately optimal algorithm configuration. International Conference on Machine Learning (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • 1. Theorem B.2 (e.g., Mohri et al. [38]). Let F ⊆ [0, 1]X be a set of functions mapping a domain X to [0, 1]. With probability at least 1−δ over the draw of N samples S = {(x1, y1),..., (xN, yN )} ∼ DN, the following holds for all f ∈ F:
    Google ScholarFindings
  • 1. Then for all functions u and w, uw 1 ≤ u p0 w p1.
    Google ScholarFindings
  • 1. Next, we evaluate fr0 (xi)
    Google ScholarFindings
  • 2. The class G is statistically learnable.
    Google ScholarFindings
  • 3. The class F is not γ-statistically learnable.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments