# Learning to Approximate a Bregman Divergence

NIPS 2020, 2020.

EI

Weibo:

Abstract:

Bregman divergences generalize measures such as the squared Euclidean distance and the KL divergence, and arise throughout many areas of machine learning. In this paper, we focus on the problem of approximating an arbitrary Bregman divergence from supervision, and we provide a well-principled approach to analyzing such approximations. We ...More

Code:

Data:

Introduction

- Bregman divergences arise frequently in machine learning. They play an important role in clustering [3] and optimization [7], and specific Bregman divergences such as the KL divergence and squared Euclidean distance are fundamental in many areas.
- The goal of this paper is to provide a well-principled framework for learning an arbitrary Bregman divergence from supervision.
- Such Bregman divergences can be utilized in downstream tasks such as clustering, similarity search, and ranking.
- The authors prove that the gradient of these functions can approximate the gradient of the convex function that they are approximating, making it a suitable choice for approximating arbitrary Bregman divergences

Highlights

- Bregman divergences arise frequently in machine learning. They play an important role in clustering [3] and optimization [7], and specific Bregman divergences such as the KL divergence and squared Euclidean distance are fundamental in many areas
- We prove that the gradient of these functions can approximate the gradient of the convex function that they are approximating, making it a suitable choice for approximating arbitrary Bregman divergences
- We developed a framework for learning arbitrary Bregman divergences by using max-affine generating functions
- The metric learning problem is a fundamental problem in machine learning, attracting considerable research and applications
- A solid theoretical understanding of the algorithms and methods of metric learning can lead to improvements in combating learning bias for these applications and reduce unnecessary errors in several systems

Methods

- The authors focus mainly on comparisons to Mahalanobis metric learning methods and their variants for the problems of clustering and ranking.
- PBDL ITML [10] LMNN [26] GB-LMNN [18] GMML [29] Kernel NCA [11] MLR-AUC [23] Euclidean.
- PBDL ITML LMNN GB-LMNN GMML NCA Kernel NCA MLR-AUC Euclidean.
- PBDL ITML LMNN GB-LMNN GMML Kernel NCA MLR-AUC Euclidean Clustering.
- Rand-Ind % Purity % Ranking AUC %

Conclusion

**Discussion and Observations**

On the benchmark datasets examined, the method yields the best or second-best results on 14 of the 16 comparisons (4 datasets by 4 measures per dataset); the best method (GMML) yields best or second-best results on 8 comparisons.- The metric learning problem is a fundamental problem in machine learning, attracting considerable research and applications
- These applications include face verification, image retrieval, human activity recognition, program debugging, music analysis, and microarray data analysis.
- Fundamental work in this problem will help to improve results in these applications as well as lead to further impact in new domains.
- A solid theoretical understanding of the algorithms and methods of metric learning can lead to improvements in combating learning bias for these applications and reduce unnecessary errors in several systems

Summary

## Introduction:

Bregman divergences arise frequently in machine learning. They play an important role in clustering [3] and optimization [7], and specific Bregman divergences such as the KL divergence and squared Euclidean distance are fundamental in many areas.- The goal of this paper is to provide a well-principled framework for learning an arbitrary Bregman divergence from supervision.
- Such Bregman divergences can be utilized in downstream tasks such as clustering, similarity search, and ranking.
- The authors prove that the gradient of these functions can approximate the gradient of the convex function that they are approximating, making it a suitable choice for approximating arbitrary Bregman divergences
## Objectives:

The goal of this paper is to provide a well-principled framework for learning an arbitrary Bregman divergence from supervision.- The authors stress that the goal is to approximate Bregman divergences, and as such strict convexity and differentiability are not required of the class of approximators when approximating an arbitrary Bregman divergence
## Methods:

The authors focus mainly on comparisons to Mahalanobis metric learning methods and their variants for the problems of clustering and ranking.- PBDL ITML [10] LMNN [26] GB-LMNN [18] GMML [29] Kernel NCA [11] MLR-AUC [23] Euclidean.
- PBDL ITML LMNN GB-LMNN GMML NCA Kernel NCA MLR-AUC Euclidean.
- PBDL ITML LMNN GB-LMNN GMML Kernel NCA MLR-AUC Euclidean Clustering.
- Rand-Ind % Purity % Ranking AUC %
## Conclusion:

**Discussion and Observations**

On the benchmark datasets examined, the method yields the best or second-best results on 14 of the 16 comparisons (4 datasets by 4 measures per dataset); the best method (GMML) yields best or second-best results on 8 comparisons.- The metric learning problem is a fundamental problem in machine learning, attracting considerable research and applications
- These applications include face verification, image retrieval, human activity recognition, program debugging, music analysis, and microarray data analysis.
- Fundamental work in this problem will help to improve results in these applications as well as lead to further impact in new domains.
- A solid theoretical understanding of the algorithms and methods of metric learning can lead to improvements in combating learning bias for these applications and reduce unnecessary errors in several systems

- Table1: Learning Bregman divergences (PDBL) compared to existing linear and non-linear metric learning approaches on standard UCI benchmarks. PDBL performs first or second among these benchmarks in 14 of 16 comparisons, outperforming all of the other methods. Note that the top two results for each setting are indicated in bold

Related work

- To our knowledge, the only existing work on approximating a Bregman divergence is [27], but this work does not provide any statistical guarantees. They assume that the underlying convex function is of the form φ(x) =

N i=1 αih(xT xi), αi ≥ 0, where h(·)

is a pre-specified convex function such as |z|d. Namely, it is a linear superposition of known convex functions h(·) evaluated on all of the training data. In our preliminary experiments, we have found this assumption to be quite restrictive and falls well short of state-of-art accuracy on benchmark datasets. In contrast to their work, we consider a piecewise linear family of convex functions capable of approximating any convex function.

Other relevant non-linear methods include the kernelization of linear methods, as discussed in [19]

Funding

- This research was supported by NSF CAREER Award 1559558, CCF-2007350 (VS), CCF-2022446 (VS), CCF-1955981 (VS) and the Data Science Faculty Fellowship from the Rafik B

Study subjects and analysis

standard UCI classification data sets: 4

4.1 Bregman clustering and similarity ranking from relative similarity comparisons. In this experiment we implement PBDL on four standard UCI classification data sets that have previously been used for metric learning benchmarking. See the supplementary material for additional data sets

datasets: 4

4.2 Discussion and Observations. On the benchmark datasets examined, our method yields the best or second-best results on 14 of the 16 comparisons (4 datasets by 4 measures per dataset); the next best method (GMML) yields best or second-best results on 8 comparisons. This suggests that Bregman divergences are competitive for downstream clustering and ranking tasks

Reference

- Gábor Balázs. Convex Regression: Theory, Practice, and Applications. PhD thesis, University of Alberta, 2016.
- Gábor Balázs, András György, and Csaba Szepesvári. Near-optimal max-affine estimators for convex regression. In AISTATS, 2015.
- Arindam Banerjee, Srujana Merugu, Inderjit S Dhillon, and Joydeep Ghosh. Clustering with bregman divergences. Journal of machine learning research, 6(Oct):1705–1749, 2005.
- Aurélien Bellet and Amaury Habrard. Robustness and generalization for metric learning. Neurocomputing, 151:259–267, 2015.
- Aurélien Bellet, Amaury Habrard, and Marc Sebban. Metric learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 9(1):1–151, 2015.
- Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
- L. M. Bregman. The relxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7(3):200–217, 1967.
- Qiong Cao, Zheng-Chu Guo, and Yiming Ying. Generalization bounds for metric and similarity learning. Machine Learning, 102(1):115–132, 2016.
- Kubra Cilingir, Rachel Manzelli, and Brian Kulis. Deep divergence learning. In Proceedings of the International Conference on Machine Learning (ICML), 2020.
- Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon. Informationtheoretic metric learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 209–216. ACM, 2007.
- Jacob Goldberger, Geoffrey E Hinton, Sam T Roweis, and Ruslan R Salakhutdinov. Neighbourhood components analysis. In Advances in neural information processing systems, pages 513–520, 2005.
- Gaurav Gothoskar, Alex Doboli, and Simona Doboli. Piecewise-linear modeling of analog circuits based on model extraction from trained neural networks. In Proceedings of the 2002 IEEE International Workshop on Behavioral Modeling and Simulation, 2002. BMAS 2002., pages 41–46. IEEE, 2002.
- LLC Gurobi Optimization. Gurobi optimizer reference manual, 2018. URL http://www.gurobi.com.
- Lauren A Hannah and David B Dunson. Ensemble methods for convex regression with applications to geometric programming based circuit design. In Proceedings of the International Conference on Machine Learning (ICML), pages 147–154, 2012.
- Lauren A Hannah and David B Dunson. Multivariate convex regression with adaptive partitioning. The Journal of Machine Learning Research, 14(1):3261–3294, 2013.
- Elad Hoffer and Nir Ailon. Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition, pages 84–92.
- Pedro Julián, Mario Jordán, and Alfredo Desages. Canonical piecewise-linear approximation of smooth functions. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 45(5):567–571, 1998.
- Dor Kedem, Stephen Tyree, Fei Sha, Gert R Lanckriet, and Kilian Q Weinberger. Non-linear metric learning. In Advances in neural information processing systems, pages 2573–2581, 2012.
- Brian Kulis. Metric learning: A survey. Foundations and Trends R in Machine Learning, 5(4): 287–364, 2013.
- Tie-Yan Liu et al. Learning to rank for information retrieval. Foundations and Trends R in Information Retrieval, 3(3):225–331, 2009.
- Alessandro Magnani and Stephen P Boyd. Convex piecewise-linear fitting. Optimization and Engineering, 10(1):1–17, 2009.
- Olvi L Mangasarian, J Ben Rosen, and ME Thompson. Global minimization via piecewise-linear underestimation. Journal of Global Optimization, 32(1):1–9, 2005.
- Brian McFee and Gert R Lanckriet. Metric learning to rank. In Proceedings of the International Conference on Machine Learning (ICML), pages 775–782, 2010.
- Matthew Schultz and Thorsten Joachims. Learning a distance metric from relative comparisons. In Advances in neural information processing systems, pages 41–48, 2004.
- Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramch, ran, and Martin J. Wainwright. Estimation from pairwise comparisons: Sharp minimax bounds with topology dependence. Journal of Machine Learning Research, 17(58):1–47, 2016. URL http://jmlr.org/papers/v17/15-189.html.
- Kilian Q Weinberger and Lawrence K Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(Feb):207–244, 2009.
- Lei Wu, Rong Jin, Steven C Hoi, Jianke Zhu, and Nenghai Yu. Learning bregman distance functions and its application for semi-supervised clustering. In Advances in neural information processing systems, pages 2089–2097, 2009.
- Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z Li. Deep metric learning for person reidentification. In International Conference on Pattern Recognition, pages 34–39. IEEE, 2014.
- Pourya Zadeh, Reshad Hosseini, and Suvrit Sra. Geometric mean metric learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 2464–2471, 2016.

Tags

Comments