Generalization in portfolio-based algorithm selection

Cited by: 0|Bibtex|Views13
Other Links: arxiv.org
Weibo:
There is a tradeoff when increasing the portfolio size, since a large portfolio allows for the possibility of including a strong parameter setting for every instance, but this potential for performance improvement is overshadowed by a worsening propensity towards overfitting

Abstract:

Portfolio-based algorithm selection has seen tremendous practical success over the past two decades. This algorithm configuration procedure works by first selecting a portfolio of diverse algorithm parameter settings, and then, on a given problem instance, using an algorithm selector to choose a parameter setting from the portfolio with...More

Code:

Data:

0
Introduction
  • Algorithms for many problems have tunable parameters.
  • With a deft parameter tuning, these algorithms can often efficiently solve computationally challenging problems.
  • The best parameter setting for one problem is rarely optimal for another.
  • Algorithm portfolios—which are finite sets of parameter settings—are used in practice to deal with this variability.
  • A portfolio is often used in conjunction with an algorithm selector, which is a function that determines which parameter setting in the portfolio to employ on any input problem instance.
  • Portfolio-based algorithm selection has seen tremendous empirical success, fueling breakthroughs in combinatorial auction winner determination [23, 32], SAT [38], integer programming [22, 39], planning [15, 29], and many other domains
Highlights
  • Algorithms for many problems have tunable parameters
  • A portfolio is often used in conjunction with an algorithm selector, which is a function that determines which parameter setting in the portfolio to employ on any input problem instance
  • Given a training set of size N, we prove that the generalization error is bounded1 by Od+ κ log t /N, where κ is the size of the portfolio and dmeasures the intrinsic complexity of the algorithm selector, as we define in Section 3
  • We provided guarantees for learning a portfolio of parameter settings in conjunction with an algorithm selector for that portfolio
  • We provided a tight bound on the number of samples sufficient and necessary to ensure that the selector’s average performance on the training set generalizes to its expected performance on the real unknown problem instance distribution
  • There is a tradeoff when increasing the portfolio size, since a large portfolio allows for the possibility of including a strong parameter setting for every instance, but this potential for performance improvement is overshadowed by a worsening propensity towards overfitting
Methods
  • The authors provide experiments that illustrate the tradeoff the authors investigated from a theoretical perspective in the previous sections: as the authors increase the portfolio size, the authors can hope to include a well-suited parameter setting for any problem instance, but it becomes increasingly difficult to avoid overfitting.
  • The authors illustrate this in the context of integer programming algorithm configuration.
  • The authors aim to learn a portfolio Pand selector fresulting in small expected tree size E uf(z)(z)
Conclusion
  • Focusing first on test performance using the largest training set size N = 2 · 105, the authors see that test performance continues to improve as the authors increase the portfolio size, though training and test performance steadily diverge
  • This illustrates the tradeoff the authors investigated from a theoretical perspective in this paper: as the authors increase the portfolio size, the authors can hope to include a well-suited parameter setting for every instance, but the generalization error will worsen.
  • A direction for future research is to understand how the diversity of a portfolio impacts its generalization error, since algorithm portfolios are often expressly designed to be diverse
Summary
  • Introduction:

    Algorithms for many problems have tunable parameters.
  • With a deft parameter tuning, these algorithms can often efficiently solve computationally challenging problems.
  • The best parameter setting for one problem is rarely optimal for another.
  • Algorithm portfolios—which are finite sets of parameter settings—are used in practice to deal with this variability.
  • A portfolio is often used in conjunction with an algorithm selector, which is a function that determines which parameter setting in the portfolio to employ on any input problem instance.
  • Portfolio-based algorithm selection has seen tremendous empirical success, fueling breakthroughs in combinatorial auction winner determination [23, 32], SAT [38], integer programming [22, 39], planning [15, 29], and many other domains
  • Objectives:

    That is a distinct problem from ours, since the goal is to learn an algorithm selector rather than a schedule.
  • The authors' goal is to bound the number of ways the functions gT can label these instances.
  • The authors' goal is to bound the number of ways the functions gX can label these instances as the authors vary X ∈ Rm×κ.
  • The authors aim to learn a portfolio Pand selector fresulting in small expected tree size E uf(z)(z)
  • Methods:

    The authors provide experiments that illustrate the tradeoff the authors investigated from a theoretical perspective in the previous sections: as the authors increase the portfolio size, the authors can hope to include a well-suited parameter setting for any problem instance, but it becomes increasingly difficult to avoid overfitting.
  • The authors illustrate this in the context of integer programming algorithm configuration.
  • The authors aim to learn a portfolio Pand selector fresulting in small expected tree size E uf(z)(z)
  • Conclusion:

    Focusing first on test performance using the largest training set size N = 2 · 105, the authors see that test performance continues to improve as the authors increase the portfolio size, though training and test performance steadily diverge
  • This illustrates the tradeoff the authors investigated from a theoretical perspective in this paper: as the authors increase the portfolio size, the authors can hope to include a well-suited parameter setting for every instance, but the generalization error will worsen.
  • A direction for future research is to understand how the diversity of a portfolio impacts its generalization error, since algorithm portfolios are often expressly designed to be diverse
Funding
  • This material is based on work supported by the National Science Foundation under grants CCF1535967, CCF-1733556, CCF-1910321, IIS-1617590, IIS-1618714, IIS-1718457, IIS-1901403, and SES-1919453; the ARO under awards W911NF1710082 and W911NF2010081; the Defense Advanced Research Projects Agency under cooperative agreement HR00112020003; an AWS Machine Learning Research Award; an Amazon Research Award; a Bloomberg Research Grant; a Microsoft Research Faculty Fellowship; an IBM PhD fellowship; and a fellowship from Carnegie Mellon University’s Center for Machine Learning and Health
Study subjects and analysis
data: 102
Specifically, we plot vκ/v1. These are the blue solid (N = 102), orange dashed (N = 103), green dotted (N = 104), and purple dashed (N = 2·105) lines. By the iterative fashion we constructed the portfolio, v1 is the performance of the best single parameter setting for the particular distribution, so v1 is already highly optimized

Reference
  • Tobias Achterberg. SCIP: solving constraint integer programs. Mathematical Programming Computation, 1(1):1–41, 2009.
    Google ScholarLocate open access versionFindings
  • Martin Anthony and Peter Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 2009.
    Google ScholarFindings
  • Maria-Florina Balcan. Data-driven algorithm design. In Tim Roughgarden, editor, Beyond Worst Case Analysis of Algorithms. Cambridge University Press, 2020.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, and Colin White. Learningtheoretic foundations of algorithm configuration for combinatorial partitioning problems. Conference on Learning Theory (COLT), 2017.
    Google ScholarFindings
  • Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to branch. In International Conference on Machine Learning (ICML), 2018.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan, Travis Dick, and Ellen Vitercik. Dispersion for data-driven algorithm design, online learning, and private optimization. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS), 2018.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan, Travis Dick, and Colin White. Data-driven clustering via parameterized Lloyd’s families. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, and Ellen Vitercik. How much data is sufficient to learn high-performing algorithms? arXiv preprint arXiv:1908.02894, 2019.
    Findings
  • Maria-Florina Balcan, Travis Dick, and Manuel Lang. Learning to link. Proceedings of the International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Learning to optimize computational resources: Frugal training with generalization guarantees. AAAI Conference on Artificial Intelligence (AAAI), 2020.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Refined bounds for algorithm configuration: The knife-edge of dual class approximability. In International Conference on Machine Learning (ICML), 2020.
    Google ScholarLocate open access versionFindings
  • Evelyn Beale. Branch and bound methods for mathematical programming systems. Annals of Discrete Mathematics, 5:201–219, 1979.
    Google ScholarLocate open access versionFindings
  • Michel Benichou, Jean-Michel Gauthier, Paul Girodet, Gerard Hentges, Gerard Ribiere, and O Vincent. Experiments in mixed-integer linear programming. Mathematical Programming, 1 (1):76–94, 1971.
    Google ScholarLocate open access versionFindings
  • R. C. Buck. Partition of space. Amer. Math. Monthly, 50:541–544, 1943. ISSN 0002-9890.
    Google ScholarLocate open access versionFindings
  • Isabel Cenamor, Tomas De La Rosa, and Fernando Fernandez. The IBaCoP planning system: Instance-based configured portfolios. Journal of Artificial Intelligence Research, 56:657–691, 2016.
    Google ScholarLocate open access versionFindings
  • Vikas Garg and Adam Kalai. Supervising unsupervised learning. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS). 2018.
    Google ScholarLocate open access versionFindings
  • J-M Gauthier and Gerard Ribiere. Experiments in mixed-integer linear programming using pseudo-costs. Mathematical Programming, 12(1):26–47, 1977.
    Google ScholarLocate open access versionFindings
  • Prateek Gupta, Maxime Gasse, Elias Khalil, Pawan Mudigonda, Andrea Lodi, and Yoshua Bengio. Hybrid models for learning to branch. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
    Google ScholarLocate open access versionFindings
  • Rishi Gupta and Tim Roughgarden. A PAC approach to application-specific algorithm selection. SIAM Journal on Computing, 46(3):992–1017, 2017.
    Google ScholarLocate open access versionFindings
  • David Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and computation, 100(1):78–150, 1992.
    Google ScholarLocate open access versionFindings
  • Frank Hutter, Lin Xu, Holger H Hoos, and Kevin Leyton-Brown. Algorithm runtime prediction: Methods & evaluation. Artificial Intelligence, 206:79–111, 2014.
    Google ScholarLocate open access versionFindings
  • Serdar Kadioglu, Yuri Malitsky, Meinolf Sellmann, and Kevin Tierney. ISAC—instance-specific algorithm configuration. In Proceedings of the European Conference on Artificial Intelligence (ECAI), 2010.
    Google ScholarLocate open access versionFindings
  • Kevin Leyton-Brown. Resource allocation in competitive multiagent systems. PhD thesis, Stanford University, 2003.
    Google ScholarFindings
  • Kevin Leyton-Brown, Mark Pearson, and Yoav Shoham. Towards a universal test suite for combinatorial auction algorithms. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), pages 66–76, Minneapolis, MN, 2000.
    Google ScholarLocate open access versionFindings
  • Jeff Linderoth and Martin Savelsbergh. A computational study of search strategies for mixed integer programming. INFORMS Journal of Computing, 11(2):173–187, 1999.
    Google ScholarLocate open access versionFindings
  • Shengcai Liu, Ke Tang, Yunwei Lei, and Xin Yao. On performance estimation in automatic algorithm configuration. In AAAI Conference on Artificial Intelligence (AAAI), 2020.
    Google ScholarLocate open access versionFindings
  • Balas K Natarajan. On learning sets and functions. Machine Learning, 4(1):67–97, 1989.
    Google ScholarLocate open access versionFindings
  • George Nemhauser and Laurence Wolsey. Integer and Combinatorial Optimization. John Wiley & Sons, 1999.
    Google ScholarFindings
  • Sergio Nunez, Daniel Borrajo, and Carlos Linares Lopez. Automatic construction of optimal static sequential portfolios for ai planning and beyond. Artificial Intelligence, 226:75–101, 2015.
    Google ScholarLocate open access versionFindings
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
    Google ScholarLocate open access versionFindings
  • Tuomas Sandholm. Algorithm for optimal winner determination in combinatorial auctions. Artificial Intelligence, 135:1–54, January 2002.
    Google ScholarLocate open access versionFindings
  • Tuomas Sandholm. Very-large-scale generalized combinatorial multi-attribute auctions: Lessons from conducting $60 billion of sourcing. In Zvika Neeman, Alvin Roth, and Nir Vulkan, editors, Handbook of Market Design. Oxford University Press, 2013.
    Google ScholarLocate open access versionFindings
  • Tzur Sayag, Shai Fine, and Yishay Mansour. Combining multiple heuristics. In Annual Symposium on Theoretical Aspects of Computer Science, pages 242–253.
    Google ScholarLocate open access versionFindings
  • Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
    Google ScholarFindings
  • Matthew Streeter and Daniel Golovin. An online algorithm for maximizing submodular functions. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 1577–1584, 2009.
    Google ScholarLocate open access versionFindings
  • Matthew Streeter, Daniel Golovin, and Stephen F. Smith. Combining multiple heuristics online. In AAAI Conference on Artificial Intelligence (AAAI), 2007.
    Google ScholarLocate open access versionFindings
  • Vladimir Vapnik and Alexey Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280, 1971.
    Google ScholarLocate open access versionFindings
  • L. Xu, F. Hutter, H.H. Hoos, and K. Leyton-Brown. Satzilla: portfolio-based algorithm selection for SAT. Journal of Artificial Intelligence Research, 32(1):565–606, 2008.
    Google ScholarLocate open access versionFindings
  • Lin Xu, Holger Hoos, and Kevin Leyton-Brown. Hydra: Automatically configuring algorithms for portfolio-based selection. In AAAI Conference on Artificial Intelligence (AAAI), 2010.
    Google ScholarLocate open access versionFindings
  • Giulia Zarpellon, Jason Jo, Andrea Lodi, and Yoshua Bengio. Parameterizing branch-andbound search trees to learn branching policies. arXiv preprint arXiv:2002.05120, 2020.
    Findings
  • 1. Each selector f ∈ F maps to ≤ κ parameter settings.
    Google ScholarFindings
  • 1. If i ∈ T, then fT
    Google ScholarFindings
  • 0. Therefore, S is shattered by UF, so the VC dimension of UF is at least κ. These two claims illustrate that VCdim (UF ) ≥ max κ, d = Ω κ + d.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments