## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality

NIPS 2020, (2020)

Full Text

Weibo

Keywords

Abstract

We consider the problem of estimating the Wasserstein distance between the empirical measure and a set of probability measures whose expectations over a class of functions (hypothesis class) are constrained. If this class is sufficiently rich to characterize a particular distribution (e.g., all Lipschitz functions), then our formulation r...More

Code:

Data:

Introduction

- In this paper the authors consider the problem of projecting the empirical measure, under the Wasserstein distance, to a set of probability measures that are constrained to satisfy a family of expectations over a class of functions.
- Modelling power, classical results on the rates of statistical convergence of the Wasserstein distance metric show that these rates scale poorly as a function of the dimension of the space [8].
- This further extends conventional results on the rate of statistical convergence for Wasserstein distances between an empirical distribution and the true distribution.

Highlights

- In this paper we consider the problem of projecting the empirical measure, under the Wasserstein distance, to a set of probability measures that are constrained to satisfy a family of expectations over a class of functions
- The Wasserstein distance has generated a great deal of attention in recent years across a broad spectrum of areas, ranging from artificial intelligence, learning and statistics to areas such as image analysis, economics and operations research [1, 18, 9, 12, 15]
- Modelling power, classical results on the rates of statistical convergence of the Wasserstein distance metric show that these rates scale poorly as a function of the dimension of the space [8]. This may suggest that comparing distributions based on the Wasserstein distance is a strategy that is bound to suffer from the so-called curse of dimensionality. Such theoretical performance in terms of rates of statistical convergence seems to be incompatible with the popularity of the Wasserstein distance based on the empirical performance observed in the previously mentioned application areas
- The second contribution of this paper is to study the rate of statistical convergence for Rn
- We study cases in which the hypothesis class may form an infinite-dimensional vector space encoding complex information about the joint distribution, for which we are able to show, for the first time, that it is possible to obtain a canonical rate of statistical convergence in these types of complex formulations, but to further obtain a characterization of the limiting distribution
- Motivated in this paper by the intuition that decision makers may only be concerned with some characteristics instead of all the details of the entire distribution, we consider the problem of projecting the empirical measure under the Wasserstein distance to a set of probability measures that are constrained to satisfy a family of expectations over a class of functions

Results

- Such central limit theorem results on the rate of statistical convergence for Rn provide a critically important understanding that can inform and guide algorithms, computation, and experiments.
- Following the setting in Example 3, the authors consider a convex compact domain Ω and let Bi(Ω) be any subclass of the function class f|Ω : f ∈ C2(R) .
- The authors' previous discussions and results on strong duality and statistical convergence have been limited to the case of compact domains.
- The authors turn to consider results on strong duality and statistical convergence for the case when the sample space Ω is not compact.
- The authors start by considering the results on strong duality in the case of non-compact domains, and considering the results on the rate of statistical convergence in the case of non-compact domains, both following along the lines of Example 3 above.
- Following the setting in Example 3, for linearly independent unit vectors θ1,...,θK and FB = Cb(R), the authors have the strong duality
- An important element which distinguishes the proof of the results from standard strong duality in optimal transport is that the usual technique to construct improving dual functions is not applicable since f c ∈/ LB(Rd) in general.
- Motivated in this paper by the intuition that decision makers may only be concerned with some characteristics instead of all the details of the entire distribution, the authors consider the problem of projecting the empirical measure under the Wasserstein distance to a set of probability measures that are constrained to satisfy a family of expectations over a class of functions.

Conclusion

- The authors study theoretical aspects of the robust Wasserstein profile functions Rn. The authors believe this work provides important insights into the empirical success of the Wasserstein distance despite the curse of dimensionality.
- Interesting future directions include studying statistical convergence for other general function classes, developing efficient algorithms to compute Rn, and applying the methods and leveraging the theoretical insights in practice.
- Because the paper provides a step towards breaking the curse of dimensionality in statistical rates of convergence, the authors believe that the authors have the potential of enabling more applications to multiple hypothesis testing.

Summary

- In this paper the authors consider the problem of projecting the empirical measure, under the Wasserstein distance, to a set of probability measures that are constrained to satisfy a family of expectations over a class of functions.
- Modelling power, classical results on the rates of statistical convergence of the Wasserstein distance metric show that these rates scale poorly as a function of the dimension of the space [8].
- This further extends conventional results on the rate of statistical convergence for Wasserstein distances between an empirical distribution and the true distribution.
- Such central limit theorem results on the rate of statistical convergence for Rn provide a critically important understanding that can inform and guide algorithms, computation, and experiments.
- Following the setting in Example 3, the authors consider a convex compact domain Ω and let Bi(Ω) be any subclass of the function class f|Ω : f ∈ C2(R) .
- The authors' previous discussions and results on strong duality and statistical convergence have been limited to the case of compact domains.
- The authors turn to consider results on strong duality and statistical convergence for the case when the sample space Ω is not compact.
- The authors start by considering the results on strong duality in the case of non-compact domains, and considering the results on the rate of statistical convergence in the case of non-compact domains, both following along the lines of Example 3 above.
- Following the setting in Example 3, for linearly independent unit vectors θ1,...,θK and FB = Cb(R), the authors have the strong duality
- An important element which distinguishes the proof of the results from standard strong duality in optimal transport is that the usual technique to construct improving dual functions is not applicable since f c ∈/ LB(Rd) in general.
- Motivated in this paper by the intuition that decision makers may only be concerned with some characteristics instead of all the details of the entire distribution, the authors consider the problem of projecting the empirical measure under the Wasserstein distance to a set of probability measures that are constrained to satisfy a family of expectations over a class of functions.
- The authors study theoretical aspects of the robust Wasserstein profile functions Rn. The authors believe this work provides important insights into the empirical success of the Wasserstein distance despite the curse of dimensionality.
- Interesting future directions include studying statistical convergence for other general function classes, developing efficient algorithms to compute Rn, and applying the methods and leveraging the theoretical insights in practice.
- Because the paper provides a step towards breaking the curse of dimensionality in statistical rates of convergence, the authors believe that the authors have the potential of enabling more applications to multiple hypothesis testing.

Funding

- Material in this paper is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-20-1-0397
- Additional support is gratefully acknowledged from NSF grants 1915967, 1820942 and 1838576

Study subjects and analysis

samples: 50

. Left: projections of P ∗ (blue shade) and P∗alt (red shade) along the three θj directions; Right: histograms of 50 samples of Rn (blue histogram) and Rnalt (red histogram) with the 95% quantile of Rn marked as a dashed black line.

Reference

- Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein GAN. arXiv preprint arXiv:1701.07875, 2017.
- Jose Blanchet, Lin Chen, and Xun Yu Zhou. Distributionally robust mean-variance portfolio selection with Wasserstein distances. arXiv preprint arXiv:1802.04885, 2018.
- Jose Blanchet, Yang Kang, and Karthyek Murthy. Robust Wasserstein profile inference and applications to machine learning. Journal of Applied Probability, 56(3):830–857, 2019.
- Jose Blanchet and Karthyek Murthy. Quantifying distributional model risk via optimal transport. Mathematics of Operations Research, 44(2):565–600, 2019.
- Jose Blanchet, Karthyek Murthy, and Nian Si. Confidence regions in Wasserstein distributionally robust estimation. arXiv preprint arXiv:1906.01614, 2019.
- Sergey Bobkov and Michel Ledoux. One-dimensional empirical measures, order statistics, and Kantorovich transport distances, volume 261. American Mathematical Society, 2019.
- Ishan Deshpande, Yuan-Ting Hu, Ruoyu Sun, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Zhizhen Zhao, David Forsyth, and Alexander G Schwing. Max-Sliced Wasserstein distance and its use for GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10648–10656, 2019.
- Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162(3-4):707–738, 2015.
- Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, and Tomaso A. Poggio. Learning with a Wasserstein loss. In Advances in Neural Information Processing Systems (NIPS) 28, 2015.
- Rui Gao and Anton J Kleywegt. Distributionally robust stochastic optimization with Wasserstein distance. arXiv preprint arXiv:1604.02199, 2016.
- Aude Genevay, Lénaic Chizat, Francis Bach, Marco Cuturi, and Gabriel Peyré. Sample complexity of Sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1574–1583. PMLR, 2019.
- Jun-Ya Gotoh, Michael Jong Kim, and Andrew EB Lim. Calibration of distributionally robust empirical optimization models. arXiv preprint arXiv:1711.06565, 2017.
- A. Graps. An introduction to wavelets. IEEE Computational Science and Engineering, 2(2):50– 61, 1995.
- Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge university press, 2012.
- S. Kolouri, S. R. Park, M. Thorpe, D. Slepcev, and G. K. Rohde. Optimal mass transport: Signal processing and machine-learning applications. IEEE Signal Processing Magazine, 34(4):43–59, July 2017.
- Peyman Mohajerin Esfahani and Daniel Kuhn. Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Mathematical Programming, 171(1):115–166, Sep 2018.
- Dimitris N Politis, Joseph P Romano, and Michael Wolf. Subsampling. Springer Science & Business Media, 1999.
- Aman Sinha, Hongseok Namkoong, and John Duchi. Certifiable distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
- Maurice Sion et al. On general minimax theorems. Pacific Journal of mathematics, 8(1):171– 176, 1958.
- Carla Tameling, Max Sommerfeld, and Axel Munk. Empirical optimal transport on countable metric spaces: Distributional limits and statistical applications. The Annals of Applied Probability, 29(5):2744–2781, 2019.
- Aad W Van der Vaart. Asymptotic statistics. Cambridge University press, 2000.
- A.W. Van der Vaart and J.A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics Springer Series in Statistics. Springer, 1996.
- Cédric Villani. Topics in optimal transportation. Number 58. American Mathematical Soc., 2003.
- Cédric Villani. Optimal transport: old and new, volume 338. Springer Science & Business
- Jonathan Weed and Francis Bach. Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli, 25(4A):2620–2648, 2019.
- Jonathan Weed and Quentin Berthet. Estimation of smooth densities in Wasserstein distance. arXiv preprint arXiv:1902.01778, 2019.
- Chaoyue Zhao and Yongpei Guan. Data-driven risk-averse stochastic optimization with Wasserstein metric. Operations Research Letters, 46(2):262 – 267, 2018.

Tags

Comments