AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We introduce a novel paradigm of non-parametric local conditional estimation based on distributionally robust optimization

Distributionally Robust Local Non-parametric Conditional Estimation

NIPS 2020, (2020)

被引用0|浏览36
EI
下载 PDF 全文
引用
微博一下

摘要

Conditional estimation given specific covariate values (i.e., local conditional estimation or functional estimation) is ubiquitously useful with applications in engineering, social and natural sciences. Existing data-driven non-parametric estimators mostly focus on structured homogeneous data (e.g., weakly independent and stationary dat...更多

代码

数据

0
简介
  • The authors consider the estimation of conditional statistics of a response variable, Y ∈ Rm, given the value of a predictor or covariate X ∈ Rn.
  • The authors propose the following distributionally robust local conditional estimation problem min β
  • 2. The authors demonstrate that when the ambiguity set is a type-∞ Wasserstein ball around the empirical measure, the proposed min-max estimation problem can be efficiently solved in many applicable settings, including notably the local conditional mean and quantile estimation.
重点内容
  • We consider the estimation of conditional statistics of a response variable, Y ∈ Rm, given the value of a predictor or covariate X ∈ Rn
  • 22], where the minimization is taken over the space of all measurable functions from Rn to Rm
  • We introduce a novel paradigm of non-parametric local conditional estimation based on distributionally robust optimization
  • We demonstrate that when the ambiguity set is a type-∞ Wasserstein ball around the empirical measure, the proposed min-max estimation problem can be efficiently solved in many applicable settings, including notably the local conditional mean and quantile estimation
  • Since our contribution is primarily on introducing a novel conceptual paradigm powered by distributionally robust optimization (DRO), we focus on discussing well-understood estimators that encompass most of the conceptual ideas used to mitigate the challenges exposed earlier
  • The conditional mean estimation problem is challenging when x0 is close to the jump points of the density function p(x), that is at x0 = 0.3 or x0 = 0.7, because the data are gathered unequally in the neighborhoods
结果
  • 2 Local Conditional Estimate using Type-∞ Wasserstein Ambiguity Set
  • To solve the estimation problem (2), the authors study the worst-case conditional expected loss function f (β)
  • The distributionally robust local conditional estimation problem (2) is equivalent to the second-order cone program min λ s.
  • The values of α calculated in Theorem 2.3 are of indicative nature: αi = 1 if it is optimal to perturb the sample point i to compute the worst-case conditional expected loss.
  • The conditional mean estimation problem is challenging when x0 is close to the jump points of the density function p(x), that is at x0 = 0.3 or x0 = 0.7, because the data are gathered unequally in the neighborhoods.
  • By decomposing the measure Q using the set of probability measures πi and exploiting the definition of the type-∞ Wasserstein distance as in the proof of Proposition 2.2, the authors have
  • Let I and I1 be the index sets defined as in (4a)-(4b), the value f (β) is equal to the optimal value of a fractional linear program f (β) = max i∈The author vis (β)αi i∈The author αis
  • Before proving Proposition 2.5, the authors need the following two results which asserts the analytical optimal value of maximizing a convex quadratic functions over a norm ball.
  • To facilitate the proof of Lemma B.3, the authors define the following conditional ambiguity set induced by B∞ ρ as
  • Where the last constraint defining the set Bx0,γ(B∞ ρ ) is from the dis-integration of the joint measure into a marginal distribution and the corresponding conditional distributions [39, Theorem 9.2.2].
结论
  • The proof of Lemma B.3 relies on the following two results which assert the convexity of the joint ambiguity set B∞ ρ and its induced conditional ambiguity set Bx0,γ(B∞ ρ ).
  • By the definition of the conditional ambiguity set Bx0,γ(B∞ ρ ), it suffices to prove the equivalence min sup β∈Rm μ0∈Bx0,γ (B∞ ρ )
  • The authors elaborate here on the procedure of applying a golden-section search to solve a one-dimensional local conditional estimation with a convex loss function .
表格
  • Table1: Median of hyper-parameters (H.P.) obtained with cross-validation
  • Table2: Comparison of expected outof-sample classification accuracy (in %
  • Table3: Median of hyper-parameters (H.P.) for synthetic data experiment obtained with crossvalidation
Download tables as Excel
相关工作
  • One can argue that every single prediction task in machine learning ultimately relates to conditional estimation. So, attempting to provide a full literature survey on non-parametric conditional estimation is an impossible task. Since our contribution is primarily on introducing a novel conceptual paradigm powered by DRO, we focus on discussing well-understood estimators that encompass most of the conceptual ideas used to mitigate the challenges exposed earlier.

    The challenges of conditioning on zero probability events and the fact that x0 may not be a part of the sample are addressed based on the idea of averaging around a neighborhood of the point of interest and smoothing. This gives rise to estimators such as k-NN (see, for example, [13]), and kernel density estimators, including, for instance the Nadaraya-Watson estimator ([31, 43]) and the Epanechnikov estimator [14], among others. Additional averaging methods include, for example, random forests [9] and Classification and Regression Trees (CARTs, [10]), see also [20] for other techniques.
基金
  • Material in this paper is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-20-1-0397
  • Additional support is gratefully acknowledged from NSF grants 1915967, 1820942, 1838676 and from the China Merchant Bank. A Additional Experiment Results A.1
研究对象与分析
data: 100
Finally, we report on an experiment that challenges the capacity of both N-W and DRCME estimators to be resilient to adversarial corruption of the test images. This is done by exposing the two estimators to images from the training set (N = 100) that have been corrupted in a way that makes them resemble the closest differently-labeled image in the set.2. Figure 5 presents several visual examples of the progressively corrupted images and the resulting N-W and DRCME estimations

引用论文
  • C. D. Aliprantis and K. C. Border. Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, 2006.
    Google ScholarFindings
  • F. Alizadeh and D. Goldfarb. Second-order cone programming. Mathematical Programming, 95:3–51, 2003.
    Google ScholarLocate open access versionFindings
  • D. P. Bertsekas. Control of Uncertain Systems with a Set-Membership Description of Uncertainty. PhD thesis, Massachusetts Institute of Technology, 1971.
    Google ScholarFindings
  • D. Bertsimas, V. Gupta, and N. Kallus. Data-driven robust optimization. Mathematical Programming, 167(2):235–292, 2018.
    Google ScholarLocate open access versionFindings
  • D. Bertsimas, C. McCord, and B. Sturt. Dynamic optimization with side information. arXiv preprint arXiv:1907.07307, 2019.
    Findings
  • D. Bertsimas, S. Shtern, and B. Sturt. Two-stage sample robust optimization. arXiv preprint arXiv:1907.07142, 2019.
    Findings
  • R. Bhattacharjee and K. Chaudhuri. When are non-parametric methods robust? In International Conference on Machine Learning, 2020.
    Google ScholarLocate open access versionFindings
  • J. Blanchet and K. Murthy. Quantifying distributional model risk via optimal transport. Mathematics of Operations Research, 44(2):565–600, 2019.
    Google ScholarLocate open access versionFindings
  • L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.
    Google ScholarLocate open access versionFindings
  • L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth and Brooks, 1984.
    Google ScholarFindings
  • A. Charnes and W. W. Cooper. Programming with linear fractional functionals. Naval Research Logistics Quarterly, 9(3-4):181–186, 1962.
    Google ScholarLocate open access versionFindings
  • E. Delage and Y. Ye. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations Research, 58(3):595–612, 2010.
    Google ScholarLocate open access versionFindings
  • L. Devroye. The uniform convergence of nearest neighbor regression function estimators and their application in optimization. IEEE Transactions on Information Theory, 24(2):142–151, 1978.
    Google ScholarLocate open access versionFindings
  • V. A. Epanechnikov. Non-parametric estimation of a multivariate probability density. Theory of Probability & Its Applications, 14(1):153–158, 1969.
    Google ScholarLocate open access versionFindings
  • R. Flamary and N. Courty. Pot python optimal transport library, 2017.
    Google ScholarFindings
  • R. Gao and A. J. Kleywegt. Distributionally robust stochastic optimization with Wasserstein distance. arXiv preprint arXiv:1604.02199, 2016.
    Findings
  • N. García Trillos and D. Slepcev. On the rate of convergence of empirical measures in ∞transportation distance. Canadian Journal of Mathematics, 67(6):1358–1383, 2015.
    Google ScholarLocate open access versionFindings
  • C. Givens and R. Shortt. A class of Wasserstein metrics for probability distributions. The Michigan Mathematical Journal, 31(2):231–240, 1984.
    Google ScholarLocate open access versionFindings
  • I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In Proceedings of the Third International Conference on Learning Representations, 2015.
    Google ScholarLocate open access versionFindings
  • T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009.
    Google ScholarFindings
  • [22] M. Khoury and D. Hadfield-Menell. On the geometry of adversarial examples. arXiv preprint arXiv:1811.00525, 2018.
    Findings
  • [23] S. Kruk and H. Wolkowicz. Pseudolinear programming. SIAM Review, 41(4):795–805, 1999.
    Google ScholarLocate open access versionFindings
  • [24] D. Kuhn, P. M. Esfahani, V. A. Nguyen, and S. Shafieezadeh-Abadeh. Wasserstein distributionally robust optimization: Theory and applications in machine learning. INFORMS TutORials in Operations Research, pages 130–166, 2019.
    Google ScholarLocate open access versionFindings
  • [25] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In Proceedings of the Fifth International Conference on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • [26] Y. LeCun and C. Cortes. The MNIST Database of Handwritten Digits, 1998 (accessed May 28, 2020).
    Google ScholarFindings
  • [27] X. Li, Y. Chen, Y. He, and H. Xue. Advknn: Adversarial attacks on k-nearest neighbor classifiers with approximate gradients. arXiv preprint arXiv:1911.06591, 2019.
    Findings
  • [28] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In Proceedings of the Sixth International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • [29] P. Mohajerin Esfahani and D. Kuhn. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Mathematical Programming, 171(1-2):115–166, 2018.
    Google ScholarLocate open access versionFindings
  • [30] MOSEK ApS. MOSEK Optimizer API for Python 9.2.10, 2019.
    Google ScholarFindings
  • [31] E. A. Nadaraya. On estimating regression. Theory of Probability & Its Applications, 9(1):141– 142, 1964.
    Google ScholarLocate open access versionFindings
  • [32] H. Namkoong and J. C. Duchi. Variance-based regularization with convex objectives. In Advances in Neural Information Processing Systems 30, pages 2971–2980, 2017.
    Google ScholarLocate open access versionFindings
  • [33] V. A. Nguyen, D. Kuhn, and P. Mohajerin Esfahani. Distributionally robust inverse covariance estimation: The Wasserstein shrinkage estimator. arXiv preprint arXiv:1805.07194, 2018.
    Findings
  • [34] A. Raghunathan, J. Steinhardt, and P. Liang. Certified defenses against adversarial examples. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • [35] S. Shafieezadeh-Abadeh, D. Kuhn, and P. M. Esfahani. Regularization via mass transportation. Journal of Machine Learning Research, 20(103):1–68, 2019.
    Google ScholarLocate open access versionFindings
  • [36] A. Sinha, H. Namkoong, and J. Duchi. Certifiable distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • [37] M. Sion. On general minimax theorems. Pacific Journal of Mathematics, 8(1):171–176, 1958.
    Google ScholarLocate open access versionFindings
  • [38] C. J. Stone. Consistent nonparametric regression. Annals of Statistics, 5(4):595–620, 1977.
    Google ScholarLocate open access versionFindings
  • [39] D. Stroock. Probability Theory: An Analytic View. Cambridge University Press, 2011.
    Google ScholarFindings
  • [40] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. D. McDaniel. Ensemble adversarial training: Attacks and defenses. In Proceedings of the Sixth International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • [41] L. Wang, X. Liu, J. Yi, Z.-H. Zhou, and C.-J. Hsieh. Evaluating the robustness of nearest neighbor classifiers: A primal-dual perspective. arXiv preprint arXiv:1906.03972, 2019.
    Findings
  • [42] Y. Wang, S. Jha, and K. Chaudhuri. Analyzing the robustness of nearest neighbors to adversarial examples. In International Conference on Machine Learning, pages 5133–5142, 2018.
    Google ScholarLocate open access versionFindings
  • [43] G. S. Watson. Smooth regression analysis. Sankhya: The Indian Journal of Statistics, Series A, pages 359–372, 1964.
    Google ScholarLocate open access versionFindings
  • [44] W. Xie. Tractable reformulations of distributionally robust two-stage stochastic programs with ∞-Wasserstein distance. arXiv preprint arXiv:1908.08454, 2019.
    Findings
  • [45] Y.-Y. Yang, C. Rashtchian, Y. Wang, and K. Chaudhuri. Robustness for non-parametric classification: A generic attack and defense. In International Conference on Artificial Intelligence and Statistics, 2020.
    Google ScholarLocate open access versionFindings
  • [46] G. Zhao and Y. Ma. Robust nonparametric kernel regression estimator. Statistics & Probability Letters, 116:72–79, 2016.
    Google ScholarLocate open access versionFindings
作者
Viet Anh Nguyen
Viet Anh Nguyen
Fan Zhang
Fan Zhang
Jose Blanchet
Jose Blanchet
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科