AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We introduced a new framework for multi-objective learning that is robust against any mixture distribution of the base objectives, avoiding the subjective step of selecting mixture coefficients for multi-loss training

Agnostic Learning with Multiple Objectives

NIPS 2020, (2020)

被引用0|浏览56
EI
下载 PDF 全文
引用
微博一下

摘要

Most machine learning tasks are inherently multi-objective. This means that the learner has to come up with a model that performs well across a number of base objectives L1, . . . , Lp, as opposed to a single one. Since optimizing with respect to multiple objectives at the same time is often computationally expensive, the base objectives ...更多

代码

数据

0
简介
  • The authors propose a new framework of Agnostic Learning with Multiple Objectives (ALMO) inspired by the Agnostic Federated Learning algorithm [Mohri et al, 2019], where the underlying model is optimized for any possible distribution of the mixture weights in the ensemble Lλ =
  • Λp, with the high risk of over-fitting to a subset of the base objectives, the authors define the agnostic loss function that ensures that the model performs well against any mixture, including the worst-case mixture values.
重点内容
  • We propose a new framework of Agnostic Learning with Multiple Objectives (ALMO) inspired by the Agnostic Federated Learning algorithm [Mohri et al, 2019], where the underlying model is optimized for any possible distribution of the mixture weights in the ensemble Lλ =
  • We provide results from training on just one loss at a time and discuss the improvements of ALMO as compared to that baseline
  • The ALMO algorithm can be used as a tool for the selection of base objectives (e.g., AutoML), as ALMO during training increases the mixture weights for the worst performing base losses, while the others are decreasing
  • We introduced a new framework (ALMO) for multi-objective learning that is robust against any mixture distribution of the base objectives, avoiding the subjective step of selecting mixture coefficients for multi-loss training
  • The experiments show that the ALMO framework builds more robust models for a variety of objectives in different machine learning problems
结果
  • The authors give data-dependent generalization bounds based on the Rademacher complexity of the underlying loss functions, which are used to define the optimization algorithm and the regularization for the problem.
  • In Section 2, the authors formally describe the ALMO framework, define the agnostic multi-objective loss function, and discuss the connection of the solution to the Pareto-optimal frontier.
  • The authors define the agnostic loss function and argue that by optimizing this loss, the learner obtains a model that is robust against any mixture weights distribution in the ensemble Lλ =
  • Given a number of standard assumptions on W, Λ and the base loss functions, the optimization problem above is convex, and can be solved using gradient-based algorithms, as the authors show in Section 4.
  • The authors derive learning guarantees for the ALMO framework that rely on the Rademacher complexity of the family of loss functions and the mixture weights λ.
  • The bound suggests that the learner in the ALMO framework should seek to find a hypothesis h ∈ H that provides the best trade-off between the empirical loss Lλ(h) and the Rademacher complexity.
  • The authors' experiments serve to illustrate the application of the novel agnostic learning formulation and to support the claim that the algorithm provides more robust results than training with a fixed mixture of base objectives.
  • To solve a complex machine learning problem, the learner would wish to combine multiple loss functions, making use of their strengths and mitigating weaknesses.
结论
  • The authors do not include techniques based on searching the Pareto-efficient solutions for the mixture of losses, since the end goal of these frameworks is distinct from that of ALMO, in particular, a specific point still needs to be subsequently selected on the Pareto curve.
  • The authors introduced a new framework (ALMO) for multi-objective learning that is robust against any mixture distribution of the base objectives, avoiding the subjective step of selecting mixture coefficients for multi-loss training.
总结
  • The authors propose a new framework of Agnostic Learning with Multiple Objectives (ALMO) inspired by the Agnostic Federated Learning algorithm [Mohri et al, 2019], where the underlying model is optimized for any possible distribution of the mixture weights in the ensemble Lλ =
  • Λp, with the high risk of over-fitting to a subset of the base objectives, the authors define the agnostic loss function that ensures that the model performs well against any mixture, including the worst-case mixture values.
  • The authors give data-dependent generalization bounds based on the Rademacher complexity of the underlying loss functions, which are used to define the optimization algorithm and the regularization for the problem.
  • In Section 2, the authors formally describe the ALMO framework, define the agnostic multi-objective loss function, and discuss the connection of the solution to the Pareto-optimal frontier.
  • The authors define the agnostic loss function and argue that by optimizing this loss, the learner obtains a model that is robust against any mixture weights distribution in the ensemble Lλ =
  • Given a number of standard assumptions on W, Λ and the base loss functions, the optimization problem above is convex, and can be solved using gradient-based algorithms, as the authors show in Section 4.
  • The authors derive learning guarantees for the ALMO framework that rely on the Rademacher complexity of the family of loss functions and the mixture weights λ.
  • The bound suggests that the learner in the ALMO framework should seek to find a hypothesis h ∈ H that provides the best trade-off between the empirical loss Lλ(h) and the Rademacher complexity.
  • The authors' experiments serve to illustrate the application of the novel agnostic learning formulation and to support the claim that the algorithm provides more robust results than training with a fixed mixture of base objectives.
  • To solve a complex machine learning problem, the learner would wish to combine multiple loss functions, making use of their strengths and mitigating weaknesses.
  • The authors do not include techniques based on searching the Pareto-efficient solutions for the mixture of losses, since the end goal of these frameworks is distinct from that of ALMO, in particular, a specific point still needs to be subsequently selected on the Pareto curve.
  • The authors introduced a new framework (ALMO) for multi-objective learning that is robust against any mixture distribution of the base objectives, avoiding the subjective step of selecting mixture coefficients for multi-loss training.
表格
  • Table1: Comparison of loss functions for logistic regression model on the test set
  • Table2: Comparison of loss functions for DNN model on the test set
Download tables as Excel
基金
  • The work of MM and DS was partly supported by NSF CCF-1535987, NSF IIS-1618662, and a Google Research Award
引用论文
  • M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: a system for large-scale machine learning. In Proceedings of USENIX, 2016.
    Google ScholarLocate open access versionFindings
  • J. Angwin, J. Larson, S. Mattu, and L. Kirchner. Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. 2016. URL https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing, 2019.
    Findings
  • L. Bottou, F. E. Curtis, and J. Nocedal. Optimization methods for large-scale machine learning. Siam Review, 60(2):223–311, 2018.
    Google ScholarLocate open access versionFindings
  • Z. Chen, V. Badrinarayanan, C.-Y. Lee, and A. Rabinovich. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International Conference on Machine Learning, pages 794–803. PMLR, 2018.
    Google ScholarLocate open access versionFindings
  • F. Chollet et al. Keras. https://keras.io, 2015.
    Findings
  • D. Dua and C. Graff. UCI machine learning repository, 2017. URL https://archive.ics.uci.edu/ml/datasets/adult.
    Findings
  • J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pages 272–279, 2008.
    Google ScholarLocate open access versionFindings
  • K. Duh, K. Sudoh, X. Wu, H. Tsukada, and M. Nagata. Learning to translate with multiple objectives. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1–10, 2012.
    Google ScholarLocate open access versionFindings
  • P. Godfrey, R. Shipley, and J. Gryz. Algorithms and analyses for maximal vector computation. The VLDB Journal, 16(1):5–28, 2007.
    Google ScholarLocate open access versionFindings
  • J. Hoffman, M. Mohri, and N. Zhang. Algorithms and theory for multiple-source adaptation. In Advances in Neural Information Processing Systems, pages 8246–8256, 2018.
    Google ScholarLocate open access versionFindings
  • H. Isozaki, T. Hirao, K. Duh, K. Sudoh, and H. Tsukada. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 944–952. Association for Computational Linguistics, 2010.
    Google ScholarLocate open access versionFindings
  • Y. Jin. Multi-objective machine learning, volume 16. Springer Science & Business Media, 2006.
    Google ScholarFindings
  • Y. Jin and B. Sendhoff. Pareto-based multiobjective machine learning: An overview and case studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(3): 397–415, 2008.
    Google ScholarLocate open access versionFindings
  • A. Juditsky, A. Nemirovski, and C. Tauvel. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17–58, 2011.
    Google ScholarLocate open access versionFindings
  • A. Kendall, Y. Gal, and R. Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018.
    Google ScholarLocate open access versionFindings
  • J. Kleinberg, S. Mullainathan, and M. Raghavan. Inherent trade-offs in the fair determination of risk scores. In Innovations in Theoretical Computer Science Conference (ITCS), 2017.
    Google ScholarLocate open access versionFindings
  • V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 30, 2002.
    Google ScholarLocate open access versionFindings
  • A. Lavie and A. Agarwal. Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings of the second workshop on statistical machine translation, pages 228–231, 2007.
    Google ScholarLocate open access versionFindings
  • Y. LeCun and C. Cortes. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/, 2010. URL http://yann.lecun.com/exdb/mnist/.
    Findings
  • M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. Springer, 1991.
    Google ScholarFindings
  • R. T. Marler and J. S. Arora. Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization, 26(6):369–395, 2004.
    Google ScholarLocate open access versionFindings
  • M. Mohri, A. Rostamizadeh, and A. Talwalkar. Foundations of machine learning. MIT press, 2012.
    Google ScholarFindings
  • M. Mohri, G. Sivek, and A. T. Suresh. Agnostic federated learning. CoRR, abs/1902.00146, 2019.
    Findings
  • URL http://arxiv.org/abs/1902.00146. A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization.
    Findings
  • Programming (Mathematics). Wiley, 1983.
    Google ScholarFindings
  • K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics, 2002.
    Google ScholarLocate open access versionFindings
  • H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017. URL http://arxiv.org/abs/1708.07747. L. Zhao, M. Mammadov, and J. Yearwood. From convex to nonconvex:a loss function analysis for binary classification. In 2010 IEEE International Conference on Data Mining Workshops, pages 1281–1288. IEEE, 2010.
    Findings
作者
Javier Gonzalvo
Javier Gonzalvo
Dmitry Storcheus
Dmitry Storcheus
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科