# Generalization in portfolio-based algorithm selection

Weibo:

Abstract:

Portfolio-based algorithm selection has seen tremendous practical success over the past two decades. This algorithm configuration procedure works by first selecting a portfolio of diverse algorithm parameter settings, and then, on a given problem instance, using an algorithm selector to choose a parameter setting from the portfolio with...More

Code:

Data:

Introduction

- Algorithms for many problems have tunable parameters.
- With a deft parameter tuning, these algorithms can often efficiently solve computationally challenging problems.
- The best parameter setting for one problem is rarely optimal for another.
- Algorithm portfolios—which are finite sets of parameter settings—are used in practice to deal with this variability.
- A portfolio is often used in conjunction with an algorithm selector, which is a function that determines which parameter setting in the portfolio to employ on any input problem instance.
- Portfolio-based algorithm selection has seen tremendous empirical success, fueling breakthroughs in combinatorial auction winner determination [23, 32], SAT [38], integer programming [22, 39], planning [15, 29], and many other domains

Highlights

- Algorithms for many problems have tunable parameters
- A portfolio is often used in conjunction with an algorithm selector, which is a function that determines which parameter setting in the portfolio to employ on any input problem instance
- Given a training set of size N, we prove that the generalization error is bounded1 by Od+ κ log t /N, where κ is the size of the portfolio and dmeasures the intrinsic complexity of the algorithm selector, as we define in Section 3
- We provided guarantees for learning a portfolio of parameter settings in conjunction with an algorithm selector for that portfolio
- We provided a tight bound on the number of samples sufficient and necessary to ensure that the selector’s average performance on the training set generalizes to its expected performance on the real unknown problem instance distribution
- There is a tradeoff when increasing the portfolio size, since a large portfolio allows for the possibility of including a strong parameter setting for every instance, but this potential for performance improvement is overshadowed by a worsening propensity towards overfitting

Methods

- The authors provide experiments that illustrate the tradeoff the authors investigated from a theoretical perspective in the previous sections: as the authors increase the portfolio size, the authors can hope to include a well-suited parameter setting for any problem instance, but it becomes increasingly difficult to avoid overfitting.
- The authors illustrate this in the context of integer programming algorithm configuration.
- The authors aim to learn a portfolio Pand selector fresulting in small expected tree size E uf(z)(z)

Conclusion

- Focusing first on test performance using the largest training set size N = 2 · 105, the authors see that test performance continues to improve as the authors increase the portfolio size, though training and test performance steadily diverge
- This illustrates the tradeoff the authors investigated from a theoretical perspective in this paper: as the authors increase the portfolio size, the authors can hope to include a well-suited parameter setting for every instance, but the generalization error will worsen.
- A direction for future research is to understand how the diversity of a portfolio impacts its generalization error, since algorithm portfolios are often expressly designed to be diverse

Summary

## Introduction:

Algorithms for many problems have tunable parameters.- With a deft parameter tuning, these algorithms can often efficiently solve computationally challenging problems.
- The best parameter setting for one problem is rarely optimal for another.
- Algorithm portfolios—which are finite sets of parameter settings—are used in practice to deal with this variability.
- A portfolio is often used in conjunction with an algorithm selector, which is a function that determines which parameter setting in the portfolio to employ on any input problem instance.
- Portfolio-based algorithm selection has seen tremendous empirical success, fueling breakthroughs in combinatorial auction winner determination [23, 32], SAT [38], integer programming [22, 39], planning [15, 29], and many other domains
## Objectives:

That is a distinct problem from ours, since the goal is to learn an algorithm selector rather than a schedule.- The authors' goal is to bound the number of ways the functions gT can label these instances.
- The authors' goal is to bound the number of ways the functions gX can label these instances as the authors vary X ∈ Rm×κ.
- The authors aim to learn a portfolio Pand selector fresulting in small expected tree size E uf(z)(z)
## Methods:

The authors provide experiments that illustrate the tradeoff the authors investigated from a theoretical perspective in the previous sections: as the authors increase the portfolio size, the authors can hope to include a well-suited parameter setting for any problem instance, but it becomes increasingly difficult to avoid overfitting.- The authors illustrate this in the context of integer programming algorithm configuration.
- The authors aim to learn a portfolio Pand selector fresulting in small expected tree size E uf(z)(z)
## Conclusion:

Focusing first on test performance using the largest training set size N = 2 · 105, the authors see that test performance continues to improve as the authors increase the portfolio size, though training and test performance steadily diverge- This illustrates the tradeoff the authors investigated from a theoretical perspective in this paper: as the authors increase the portfolio size, the authors can hope to include a well-suited parameter setting for every instance, but the generalization error will worsen.
- A direction for future research is to understand how the diversity of a portfolio impacts its generalization error, since algorithm portfolios are often expressly designed to be diverse

Funding

- This material is based on work supported by the National Science Foundation under grants CCF1535967, CCF-1733556, CCF-1910321, IIS-1617590, IIS-1618714, IIS-1718457, IIS-1901403, and SES-1919453; the ARO under awards W911NF1710082 and W911NF2010081; the Defense Advanced Research Projects Agency under cooperative agreement HR00112020003; an AWS Machine Learning Research Award; an Amazon Research Award; a Bloomberg Research Grant; a Microsoft Research Faculty Fellowship; an IBM PhD fellowship; and a fellowship from Carnegie Mellon University’s Center for Machine Learning and Health

Study subjects and analysis

data: 102

Specifically, we plot vκ/v1. These are the blue solid (N = 102), orange dashed (N = 103), green dotted (N = 104), and purple dashed (N = 2·105) lines. By the iterative fashion we constructed the portfolio, v1 is the performance of the best single parameter setting for the particular distribution, so v1 is already highly optimized

Reference

- Tobias Achterberg. SCIP: solving constraint integer programs. Mathematical Programming Computation, 1(1):1–41, 2009.
- Martin Anthony and Peter Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 2009.
- Maria-Florina Balcan. Data-driven algorithm design. In Tim Roughgarden, editor, Beyond Worst Case Analysis of Algorithms. Cambridge University Press, 2020.
- Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, and Colin White. Learningtheoretic foundations of algorithm configuration for combinatorial partitioning problems. Conference on Learning Theory (COLT), 2017.
- Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to branch. In International Conference on Machine Learning (ICML), 2018.
- Maria-Florina Balcan, Travis Dick, and Ellen Vitercik. Dispersion for data-driven algorithm design, online learning, and private optimization. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS), 2018.
- Maria-Florina Balcan, Travis Dick, and Colin White. Data-driven clustering via parameterized Lloyd’s families. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2018.
- Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, and Ellen Vitercik. How much data is sufficient to learn high-performing algorithms? arXiv preprint arXiv:1908.02894, 2019.
- Maria-Florina Balcan, Travis Dick, and Manuel Lang. Learning to link. Proceedings of the International Conference on Learning Representations (ICLR), 2020.
- Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Learning to optimize computational resources: Frugal training with generalization guarantees. AAAI Conference on Artificial Intelligence (AAAI), 2020.
- Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Refined bounds for algorithm configuration: The knife-edge of dual class approximability. In International Conference on Machine Learning (ICML), 2020.
- Evelyn Beale. Branch and bound methods for mathematical programming systems. Annals of Discrete Mathematics, 5:201–219, 1979.
- Michel Benichou, Jean-Michel Gauthier, Paul Girodet, Gerard Hentges, Gerard Ribiere, and O Vincent. Experiments in mixed-integer linear programming. Mathematical Programming, 1 (1):76–94, 1971.
- R. C. Buck. Partition of space. Amer. Math. Monthly, 50:541–544, 1943. ISSN 0002-9890.
- Isabel Cenamor, Tomas De La Rosa, and Fernando Fernandez. The IBaCoP planning system: Instance-based configured portfolios. Journal of Artificial Intelligence Research, 56:657–691, 2016.
- Vikas Garg and Adam Kalai. Supervising unsupervised learning. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS). 2018.
- J-M Gauthier and Gerard Ribiere. Experiments in mixed-integer linear programming using pseudo-costs. Mathematical Programming, 12(1):26–47, 1977.
- Prateek Gupta, Maxime Gasse, Elias Khalil, Pawan Mudigonda, Andrea Lodi, and Yoshua Bengio. Hybrid models for learning to branch. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
- Rishi Gupta and Tim Roughgarden. A PAC approach to application-specific algorithm selection. SIAM Journal on Computing, 46(3):992–1017, 2017.
- David Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and computation, 100(1):78–150, 1992.
- Frank Hutter, Lin Xu, Holger H Hoos, and Kevin Leyton-Brown. Algorithm runtime prediction: Methods & evaluation. Artificial Intelligence, 206:79–111, 2014.
- Serdar Kadioglu, Yuri Malitsky, Meinolf Sellmann, and Kevin Tierney. ISAC—instance-specific algorithm configuration. In Proceedings of the European Conference on Artificial Intelligence (ECAI), 2010.
- Kevin Leyton-Brown. Resource allocation in competitive multiagent systems. PhD thesis, Stanford University, 2003.
- Kevin Leyton-Brown, Mark Pearson, and Yoav Shoham. Towards a universal test suite for combinatorial auction algorithms. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), pages 66–76, Minneapolis, MN, 2000.
- Jeff Linderoth and Martin Savelsbergh. A computational study of search strategies for mixed integer programming. INFORMS Journal of Computing, 11(2):173–187, 1999.
- Shengcai Liu, Ke Tang, Yunwei Lei, and Xin Yao. On performance estimation in automatic algorithm configuration. In AAAI Conference on Artificial Intelligence (AAAI), 2020.
- Balas K Natarajan. On learning sets and functions. Machine Learning, 4(1):67–97, 1989.
- George Nemhauser and Laurence Wolsey. Integer and Combinatorial Optimization. John Wiley & Sons, 1999.
- Sergio Nunez, Daniel Borrajo, and Carlos Linares Lopez. Automatic construction of optimal static sequential portfolios for ai planning and beyond. Artificial Intelligence, 226:75–101, 2015.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Tuomas Sandholm. Algorithm for optimal winner determination in combinatorial auctions. Artificial Intelligence, 135:1–54, January 2002.
- Tuomas Sandholm. Very-large-scale generalized combinatorial multi-attribute auctions: Lessons from conducting $60 billion of sourcing. In Zvika Neeman, Alvin Roth, and Nir Vulkan, editors, Handbook of Market Design. Oxford University Press, 2013.
- Tzur Sayag, Shai Fine, and Yishay Mansour. Combining multiple heuristics. In Annual Symposium on Theoretical Aspects of Computer Science, pages 242–253.
- Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
- Matthew Streeter and Daniel Golovin. An online algorithm for maximizing submodular functions. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 1577–1584, 2009.
- Matthew Streeter, Daniel Golovin, and Stephen F. Smith. Combining multiple heuristics online. In AAAI Conference on Artificial Intelligence (AAAI), 2007.
- Vladimir Vapnik and Alexey Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280, 1971.
- L. Xu, F. Hutter, H.H. Hoos, and K. Leyton-Brown. Satzilla: portfolio-based algorithm selection for SAT. Journal of Artificial Intelligence Research, 32(1):565–606, 2008.
- Lin Xu, Holger Hoos, and Kevin Leyton-Brown. Hydra: Automatically configuring algorithms for portfolio-based selection. In AAAI Conference on Artificial Intelligence (AAAI), 2010.
- Giulia Zarpellon, Jason Jo, Andrea Lodi, and Yoshua Bengio. Parameterizing branch-andbound search trees to learn branching policies. arXiv preprint arXiv:2002.05120, 2020.
- 1. Each selector f ∈ F maps to ≤ κ parameter settings.
- 1. If i ∈ T, then fT
- 0. Therefore, S is shattered by UF, so the VC dimension of UF is at least κ. These two claims illustrate that VCdim (UF ) ≥ max κ, d = Ω κ + d.

Full Text

Tags

Comments