Learning to Approximate a Bregman Divergence

NIPS 2020, 2020.

Cited by: 0|Views20
EI
Weibo:
The metric learning problem is a fundamental problem in machine learning, attracting considerable research and applications

Abstract:

Bregman divergences generalize measures such as the squared Euclidean distance and the KL divergence, and arise throughout many areas of machine learning. In this paper, we focus on the problem of approximating an arbitrary Bregman divergence from supervision, and we provide a well-principled approach to analyzing such approximations. We ...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • Bregman divergences arise frequently in machine learning. They play an important role in clustering [3] and optimization [7], and specific Bregman divergences such as the KL divergence and squared Euclidean distance are fundamental in many areas.
  • The goal of this paper is to provide a well-principled framework for learning an arbitrary Bregman divergence from supervision.
  • Such Bregman divergences can be utilized in downstream tasks such as clustering, similarity search, and ranking.
  • The authors prove that the gradient of these functions can approximate the gradient of the convex function that they are approximating, making it a suitable choice for approximating arbitrary Bregman divergences
Highlights
  • Bregman divergences arise frequently in machine learning. They play an important role in clustering [3] and optimization [7], and specific Bregman divergences such as the KL divergence and squared Euclidean distance are fundamental in many areas
  • We prove that the gradient of these functions can approximate the gradient of the convex function that they are approximating, making it a suitable choice for approximating arbitrary Bregman divergences
  • We developed a framework for learning arbitrary Bregman divergences by using max-affine generating functions
  • The metric learning problem is a fundamental problem in machine learning, attracting considerable research and applications
  • A solid theoretical understanding of the algorithms and methods of metric learning can lead to improvements in combating learning bias for these applications and reduce unnecessary errors in several systems
Methods
  • The authors focus mainly on comparisons to Mahalanobis metric learning methods and their variants for the problems of clustering and ranking.
  • PBDL ITML [10] LMNN [26] GB-LMNN [18] GMML [29] Kernel NCA [11] MLR-AUC [23] Euclidean.
  • PBDL ITML LMNN GB-LMNN GMML NCA Kernel NCA MLR-AUC Euclidean.
  • PBDL ITML LMNN GB-LMNN GMML Kernel NCA MLR-AUC Euclidean Clustering.
  • Rand-Ind % Purity % Ranking AUC %
Conclusion
  • Discussion and Observations

    On the benchmark datasets examined, the method yields the best or second-best results on 14 of the 16 comparisons (4 datasets by 4 measures per dataset); the best method (GMML) yields best or second-best results on 8 comparisons.
  • The metric learning problem is a fundamental problem in machine learning, attracting considerable research and applications
  • These applications include face verification, image retrieval, human activity recognition, program debugging, music analysis, and microarray data analysis.
  • Fundamental work in this problem will help to improve results in these applications as well as lead to further impact in new domains.
  • A solid theoretical understanding of the algorithms and methods of metric learning can lead to improvements in combating learning bias for these applications and reduce unnecessary errors in several systems
Summary
  • Introduction:

    Bregman divergences arise frequently in machine learning. They play an important role in clustering [3] and optimization [7], and specific Bregman divergences such as the KL divergence and squared Euclidean distance are fundamental in many areas.
  • The goal of this paper is to provide a well-principled framework for learning an arbitrary Bregman divergence from supervision.
  • Such Bregman divergences can be utilized in downstream tasks such as clustering, similarity search, and ranking.
  • The authors prove that the gradient of these functions can approximate the gradient of the convex function that they are approximating, making it a suitable choice for approximating arbitrary Bregman divergences
  • Objectives:

    The goal of this paper is to provide a well-principled framework for learning an arbitrary Bregman divergence from supervision.
  • The authors stress that the goal is to approximate Bregman divergences, and as such strict convexity and differentiability are not required of the class of approximators when approximating an arbitrary Bregman divergence
  • Methods:

    The authors focus mainly on comparisons to Mahalanobis metric learning methods and their variants for the problems of clustering and ranking.
  • PBDL ITML [10] LMNN [26] GB-LMNN [18] GMML [29] Kernel NCA [11] MLR-AUC [23] Euclidean.
  • PBDL ITML LMNN GB-LMNN GMML NCA Kernel NCA MLR-AUC Euclidean.
  • PBDL ITML LMNN GB-LMNN GMML Kernel NCA MLR-AUC Euclidean Clustering.
  • Rand-Ind % Purity % Ranking AUC %
  • Conclusion:

    Discussion and Observations

    On the benchmark datasets examined, the method yields the best or second-best results on 14 of the 16 comparisons (4 datasets by 4 measures per dataset); the best method (GMML) yields best or second-best results on 8 comparisons.
  • The metric learning problem is a fundamental problem in machine learning, attracting considerable research and applications
  • These applications include face verification, image retrieval, human activity recognition, program debugging, music analysis, and microarray data analysis.
  • Fundamental work in this problem will help to improve results in these applications as well as lead to further impact in new domains.
  • A solid theoretical understanding of the algorithms and methods of metric learning can lead to improvements in combating learning bias for these applications and reduce unnecessary errors in several systems
Tables
  • Table1: Learning Bregman divergences (PDBL) compared to existing linear and non-linear metric learning approaches on standard UCI benchmarks. PDBL performs first or second among these benchmarks in 14 of 16 comparisons, outperforming all of the other methods. Note that the top two results for each setting are indicated in bold
Download tables as Excel
Related work
  • To our knowledge, the only existing work on approximating a Bregman divergence is [27], but this work does not provide any statistical guarantees. They assume that the underlying convex function is of the form φ(x) =

    N i=1 αih(xT xi), αi ≥ 0, where h(·)

    is a pre-specified convex function such as |z|d. Namely, it is a linear superposition of known convex functions h(·) evaluated on all of the training data. In our preliminary experiments, we have found this assumption to be quite restrictive and falls well short of state-of-art accuracy on benchmark datasets. In contrast to their work, we consider a piecewise linear family of convex functions capable of approximating any convex function.

    Other relevant non-linear methods include the kernelization of linear methods, as discussed in [19]
Funding
  • This research was supported by NSF CAREER Award 1559558, CCF-2007350 (VS), CCF-2022446 (VS), CCF-1955981 (VS) and the Data Science Faculty Fellowship from the Rafik B
Study subjects and analysis
standard UCI classification data sets: 4
4.1 Bregman clustering and similarity ranking from relative similarity comparisons. In this experiment we implement PBDL on four standard UCI classification data sets that have previously been used for metric learning benchmarking. See the supplementary material for additional data sets

datasets: 4
4.2 Discussion and Observations. On the benchmark datasets examined, our method yields the best or second-best results on 14 of the 16 comparisons (4 datasets by 4 measures per dataset); the next best method (GMML) yields best or second-best results on 8 comparisons. This suggests that Bregman divergences are competitive for downstream clustering and ranking tasks

Reference
  • Gábor Balázs. Convex Regression: Theory, Practice, and Applications. PhD thesis, University of Alberta, 2016.
    Google ScholarFindings
  • Gábor Balázs, András György, and Csaba Szepesvári. Near-optimal max-affine estimators for convex regression. In AISTATS, 2015.
    Google ScholarLocate open access versionFindings
  • Arindam Banerjee, Srujana Merugu, Inderjit S Dhillon, and Joydeep Ghosh. Clustering with bregman divergences. Journal of machine learning research, 6(Oct):1705–1749, 2005.
    Google ScholarLocate open access versionFindings
  • Aurélien Bellet and Amaury Habrard. Robustness and generalization for metric learning. Neurocomputing, 151:259–267, 2015.
    Google ScholarLocate open access versionFindings
  • Aurélien Bellet, Amaury Habrard, and Marc Sebban. Metric learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 9(1):1–151, 2015.
    Google ScholarLocate open access versionFindings
  • Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
    Google ScholarFindings
  • L. M. Bregman. The relxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7(3):200–217, 1967.
    Google ScholarLocate open access versionFindings
  • Qiong Cao, Zheng-Chu Guo, and Yiming Ying. Generalization bounds for metric and similarity learning. Machine Learning, 102(1):115–132, 2016.
    Google ScholarLocate open access versionFindings
  • Kubra Cilingir, Rachel Manzelli, and Brian Kulis. Deep divergence learning. In Proceedings of the International Conference on Machine Learning (ICML), 2020.
    Google ScholarLocate open access versionFindings
  • Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon. Informationtheoretic metric learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 209–216. ACM, 2007.
    Google ScholarLocate open access versionFindings
  • Jacob Goldberger, Geoffrey E Hinton, Sam T Roweis, and Ruslan R Salakhutdinov. Neighbourhood components analysis. In Advances in neural information processing systems, pages 513–520, 2005.
    Google ScholarLocate open access versionFindings
  • Gaurav Gothoskar, Alex Doboli, and Simona Doboli. Piecewise-linear modeling of analog circuits based on model extraction from trained neural networks. In Proceedings of the 2002 IEEE International Workshop on Behavioral Modeling and Simulation, 2002. BMAS 2002., pages 41–46. IEEE, 2002.
    Google ScholarLocate open access versionFindings
  • LLC Gurobi Optimization. Gurobi optimizer reference manual, 2018. URL http://www.gurobi.com.
    Findings
  • Lauren A Hannah and David B Dunson. Ensemble methods for convex regression with applications to geometric programming based circuit design. In Proceedings of the International Conference on Machine Learning (ICML), pages 147–154, 2012.
    Google ScholarLocate open access versionFindings
  • Lauren A Hannah and David B Dunson. Multivariate convex regression with adaptive partitioning. The Journal of Machine Learning Research, 14(1):3261–3294, 2013.
    Google ScholarLocate open access versionFindings
  • Elad Hoffer and Nir Ailon. Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition, pages 84–92.
    Google ScholarLocate open access versionFindings
  • Pedro Julián, Mario Jordán, and Alfredo Desages. Canonical piecewise-linear approximation of smooth functions. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 45(5):567–571, 1998.
    Google ScholarLocate open access versionFindings
  • Dor Kedem, Stephen Tyree, Fei Sha, Gert R Lanckriet, and Kilian Q Weinberger. Non-linear metric learning. In Advances in neural information processing systems, pages 2573–2581, 2012.
    Google ScholarLocate open access versionFindings
  • Brian Kulis. Metric learning: A survey. Foundations and Trends R in Machine Learning, 5(4): 287–364, 2013.
    Google ScholarLocate open access versionFindings
  • Tie-Yan Liu et al. Learning to rank for information retrieval. Foundations and Trends R in Information Retrieval, 3(3):225–331, 2009.
    Google ScholarLocate open access versionFindings
  • Alessandro Magnani and Stephen P Boyd. Convex piecewise-linear fitting. Optimization and Engineering, 10(1):1–17, 2009.
    Google ScholarLocate open access versionFindings
  • Olvi L Mangasarian, J Ben Rosen, and ME Thompson. Global minimization via piecewise-linear underestimation. Journal of Global Optimization, 32(1):1–9, 2005.
    Google ScholarLocate open access versionFindings
  • Brian McFee and Gert R Lanckriet. Metric learning to rank. In Proceedings of the International Conference on Machine Learning (ICML), pages 775–782, 2010.
    Google ScholarLocate open access versionFindings
  • Matthew Schultz and Thorsten Joachims. Learning a distance metric from relative comparisons. In Advances in neural information processing systems, pages 41–48, 2004.
    Google ScholarLocate open access versionFindings
  • Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramch, ran, and Martin J. Wainwright. Estimation from pairwise comparisons: Sharp minimax bounds with topology dependence. Journal of Machine Learning Research, 17(58):1–47, 2016. URL http://jmlr.org/papers/v17/15-189.html.
    Locate open access versionFindings
  • Kilian Q Weinberger and Lawrence K Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(Feb):207–244, 2009.
    Google ScholarLocate open access versionFindings
  • Lei Wu, Rong Jin, Steven C Hoi, Jianke Zhu, and Nenghai Yu. Learning bregman distance functions and its application for semi-supervised clustering. In Advances in neural information processing systems, pages 2089–2097, 2009.
    Google ScholarLocate open access versionFindings
  • Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z Li. Deep metric learning for person reidentification. In International Conference on Pattern Recognition, pages 34–39. IEEE, 2014.
    Google ScholarLocate open access versionFindings
  • Pourya Zadeh, Reshad Hosseini, and Suvrit Sra. Geometric mean metric learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 2464–2471, 2016.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments