AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
This paper proposes the multi-task additive models to achieve robust estimation and automatic structure discovery

Multi-task Additive Models for Robust Estimation and Automatic Structure Discovery

NIPS 2020, (2020)

被引用0|浏览23
EI
下载 PDF 全文
引用
微博一下

摘要

Additive models have attracted much attention for high-dimensional regression estimation and variable selection. However, the existing models are usually limited to the single-task learning framework under the mean squared error (MSE) criterion, where the utilization of variable structure depends heavily on a priori knowledge among variab...更多

代码

数据

0
简介
  • Additive models [14], as nonparametric extension of linear models, have been extensively investigated in machine learning literatures [1, 5, 34, 44].
  • Typical additive models are usually formulated under Tikhonov regularization schemes and fall into two categories: one focuses on recognizing dominant variables without considering the interaction among the variables [21, 28, 29, 46] and the other aims to screen informative variables at the group level, e.g., groupwise additive models [4, 42].
  • The groupwise additive models depend heavily on a priori knowledge of variable structure.
  • The authors consider a problem commonly encountered in multi-task learning, in which all tasks share an underlying variable structure and involve data with complex non-Gaussian noises, e.g., skewed
重点内容
  • Additive models [14], as nonparametric extension of linear models, have been extensively investigated in machine learning literatures [1, 5, 34, 44]
  • We consider a problem commonly encountered in multi-task learning, in which all tasks share an underlying variable structure and involve data with complex non-Gaussian noises, e.g., skewed
  • To relax the dependence on a prior structure and Gaussian noise, this paper proposes a class of Multi-task Additive Models (MAM) by integrating additive hypothesis space, mode-induced metric [6, 41, 10], and structure-based regularizer [12] into a bilevel learning framework
  • This paper proposes the multi-task additive models to achieve robust estimation and automatic structure discovery
  • As far as we know, it is novel to explore robust interpretable machine learning by integrating modal regression, additive models and multi-task learning together
  • 2) Our MAM can be applied to other fields, e.g, gene expression analysis and drug discovery
方法
  • MAM BiGL mGAM GL Lasso RMR.
  • |V| = 0 (Gaussian noise) ASE TD WPI (SCP).
  • |V| = 5 (Gaussian noise).
  • |V| = 5 (Student noise).
  • AAE (h) Groups AAE Groups AAE Groups AAE Groups AAE Groups.
  • MAM 9.07 G1,G2,G7 45.41 G1,G2,G3,G6 59.32 G1 ,G2 G1,G2,G3 3.47 BiGL 11.09 Lasso 12.16 RMR 12.02
结果
  • Evaluation Criterion Objective Function Robust Estimation.
  • Sparsity on Grouped Features Sparsity on Individual Features Variable Structure Discovery RMR [38] Linear.
  • GroupSpAM [42] Additive.
  • Single-task Mean-induced Convex × × ×.
  • CGSI[26] Additive Single-task Mean-induced Convex BIGL[12] Linear.
  • Denote f(t), f ∗(t) as the estimator and ground truth function respectively, ≤ t.
  • Evaluation criterions used here include.
  • True Deviation (TD)= f(t) − f ∗(t) Variable Structure Recovery
结论
  • This paper proposes the multi-task additive models to achieve robust estimation and automatic structure discovery.
  • It is interesting to investigate robust additive models for overlapping variable structure discovery [17].
  • The positive impacts of this work are two-fold: 1) The authors' algorithmic framework paves a new way for mining the intrinsic feature structure among high-dimensional variables, and may be the stepping stone to further explore data-driven structure discovery with overlapping groups.
  • There is a risk of resulting an unstable estimation when facing ultra high-dimensional data
总结
  • Introduction:

    Additive models [14], as nonparametric extension of linear models, have been extensively investigated in machine learning literatures [1, 5, 34, 44].
  • Typical additive models are usually formulated under Tikhonov regularization schemes and fall into two categories: one focuses on recognizing dominant variables without considering the interaction among the variables [21, 28, 29, 46] and the other aims to screen informative variables at the group level, e.g., groupwise additive models [4, 42].
  • The groupwise additive models depend heavily on a priori knowledge of variable structure.
  • The authors consider a problem commonly encountered in multi-task learning, in which all tasks share an underlying variable structure and involve data with complex non-Gaussian noises, e.g., skewed
  • Methods:

    MAM BiGL mGAM GL Lasso RMR.
  • |V| = 0 (Gaussian noise) ASE TD WPI (SCP).
  • |V| = 5 (Gaussian noise).
  • |V| = 5 (Student noise).
  • AAE (h) Groups AAE Groups AAE Groups AAE Groups AAE Groups.
  • MAM 9.07 G1,G2,G7 45.41 G1,G2,G3,G6 59.32 G1 ,G2 G1,G2,G3 3.47 BiGL 11.09 Lasso 12.16 RMR 12.02
  • Results:

    Evaluation Criterion Objective Function Robust Estimation.
  • Sparsity on Grouped Features Sparsity on Individual Features Variable Structure Discovery RMR [38] Linear.
  • GroupSpAM [42] Additive.
  • Single-task Mean-induced Convex × × ×.
  • CGSI[26] Additive Single-task Mean-induced Convex BIGL[12] Linear.
  • Denote f(t), f ∗(t) as the estimator and ground truth function respectively, ≤ t.
  • Evaluation criterions used here include.
  • True Deviation (TD)= f(t) − f ∗(t) Variable Structure Recovery
  • Conclusion:

    This paper proposes the multi-task additive models to achieve robust estimation and automatic structure discovery.
  • It is interesting to investigate robust additive models for overlapping variable structure discovery [17].
  • The positive impacts of this work are two-fold: 1) The authors' algorithmic framework paves a new way for mining the intrinsic feature structure among high-dimensional variables, and may be the stepping stone to further explore data-driven structure discovery with overlapping groups.
  • There is a risk of resulting an unstable estimation when facing ultra high-dimensional data
表格
  • Table1: Algorithmic properties ( -has the given information, ×-hasn’t the given information)
  • Table2: Performance comparisons on Example A (top) and Example B (bottom) w.r.t different criterions
  • Table3: Average absolute error and dominant group for each task
Download tables as Excel
相关工作
  • There are some works for automatic structure discovery in additive models [26, 40] and partially linear models [19, 45]. Different from our MAM, these approaches are formulated under single-task framework and the MSE criterion, which are sensitive to non-Gaussian noise and difficult to tackle multi-task structure discovery directly. While some mode-based approaches have been designed for robust estimation, e.g., regularized modal regression (RMR) [38], none of them consider the automatic structure discovery. Recently, an extension of group lasso is formulated for variable structure discovery [12]. Although this approach can induce the data-driven sparsity at the group level, it is limited to the linear mean regression and ignores the sparsity with respect to individual features. To better highlight the novelty of MAM, its algorithmic properties are summarized in Table 1, compared with RMR [38], Group Sparse Additive Models (GroupSpAM) [42], Capacity-based group structure identification (CGSI)[26], and Bilevel learning of Group Lasso (BiGL) [12].
基金
  • This work was supported by National Natural Science Foundation of China (NSFC) 11671161, 12071166, 61972188 and 41574181, NSERC grant RGPIN-2016-05024
研究对象与分析
ICMEs observations: 137
Interplanetary CMEs (ICMEs) data are provided in The Richardson and Cane List (http://www.srl.caltech.edu/ACE/ASC/DATA/level3/ icmetable2.htm). From this link, we collect 137 ICMEs observations from 1996 to 2016. The features of CMEs are provided in SOHO LASCO CME Catalog (https://cdaw.gsfc. nasa.gov/CME_list/)

引用论文
  • R. Agarwal, N. Frosst, X. Zhang, R. Caruana, and G. E. Hinton. Neural additive models: Interpretable machine learning with neural nets. arXiv:2004.13912v1, 2020.
    Findings
  • R. Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.
    Google ScholarLocate open access versionFindings
  • H. Chen, G. Liu, and H. Huang. Sparse shrunk additive models. In International Conference on Machine Learning (ICML), 2020.
    Google ScholarLocate open access versionFindings
  • H. Chen, X. Wang, C. Deng, and H. Huang. Group sparse additive machine. In Advances in Neural Information Processing Systems (NIPS), pages 198–208. 2017.
    Google ScholarLocate open access versionFindings
  • H. Chen, Y. Wang, F. Zheng, C. Deng, and H. Huang. Sparse modal additive model. IEEE Transactions on Neural Networks and Learning Systems, Doi: 10.1109/TNNLS.2020.3005144, 2020.
    Locate open access versionFindings
  • Y. C. Chen, C. R. Genovese, R. J. Tibshirani, and L. Wasserman. Nonparametric modal regression. The Annals of Statistics, 44(2):489–514, 2016.
    Google ScholarLocate open access versionFindings
  • B. Colson, P. Marcotte, and G. Savard. An overview of bilevel optimization. Annals of Operations Research, 153(1):235–256, 2007.
    Google ScholarLocate open access versionFindings
  • L. Condat. Fast projection onto the simplex and the 1-ball. Mathematical Programming, 158(1-2):575–585, 2016.
    Google ScholarLocate open access versionFindings
  • T. Evgeniou and M. Pontil. Regularized multi–task learning. In ACM Conference on Knowledge Discovery and Data Mining, pages 109–117, 2004.
    Google ScholarLocate open access versionFindings
  • Y. Feng, J. Fan, and J. Suykens. A statistical learning approach to modal regression. Journal of Machine Learning Research, 21(2):1–35, 2020.
    Google ScholarLocate open access versionFindings
  • L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, and M. Pontil. Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine learning (ICML), pages 1563–1572, 2018.
    Google ScholarLocate open access versionFindings
  • J. Frecon, S. Salzo, and M. Pontil. Bilevel learning of the group lasso structure. In Advances in Neural Information Processing Systems (NIPS), pages 8301–8311, 2018.
    Google ScholarLocate open access versionFindings
  • J. H. Friedman, T. Hastie, and R. Tibshirani. A note on the group lasso and a sparse group lasso. arXiv: 1001.0736, 2010.
    Findings
  • T. J. Hastie and R. J. Tibshirani. Generalized additive models. London: Chapman and Hall, 1990.
    Google ScholarFindings
  • C. Heinrich. The mode functional is not elicitable. Biometrika, 101(1):245–251, 2014.
    Google ScholarLocate open access versionFindings
  • J. Huang, J. L. Horowitz, and F. Wei. Variable selection in nonparametric additive models. The Annals of Statistics, 38(4):2282–2313, 2010.
    Google ScholarLocate open access versionFindings
  • L. Jacob, G. Obozinski, and J. Vert. Group lasso with overlap and graph lasso. In International Conference on Machine Learning (ICML), pages 433–440, 2009.
    Google ScholarLocate open access versionFindings
  • K. Kandasamy and Y. Yu. Additive approximations in high dimensional nonparametric regression via the SALSA. In International Conference on Machine Learning (ICML), pages 69–78, 2016.
    Google ScholarLocate open access versionFindings
  • X. Li, L. Wang, and D. Nettleton. Sparse model identification and learning for ultra-highdimensional additive partially models. Journal of Multivariate Analysis, 173:204–228, 2019.
    Google ScholarLocate open access versionFindings
  • J. Liu, Y. Ye, C. Shen, Y. Wang, and R. Erdelyi. A new tool for cme arrival time prediction using machine learning algorithms: Cat-puma. The Astrophysical Journal, 855(2):109–118, 2018.
    Google ScholarLocate open access versionFindings
  • S. Lv, H. Lin, H. Lian, and J. Huang. Oracle inequalities for sparse additive quantile regression in reproducing kernel hilbert space. The Annals of Statistics, 46(2):781–813, 2018.
    Google ScholarLocate open access versionFindings
  • L. Meier, S. V. De Geer, and P. Buhlmann. The group lasso for logistic regression. Journal of The Royal Statistical Society Series B-statistical Methodology, 70(1):53–71, 2008.
    Google ScholarLocate open access versionFindings
  • L. Meier, S. V. D. Geer, and P. Buhlmann. High-dimensional additive modeling. The Annals of Statistics, 37(6B):3779–3821, 2009.
    Google ScholarLocate open access versionFindings
  • M. Nikolova and M. K. Ng. Analysis of half-quadratic minimization methods for signal and image recovery. SIAM Journal on Scientific Computing, 27(3):937–966, 2005.
    Google ScholarLocate open access versionFindings
  • P. Ochs, R. Ranftl, T. Brox, and T. Pock. Techniques for gradient based bilevel optimization with nonsmooth lower level problems. Journal of Mathematical Imaging and Vision, 56(2):175–194, 2016.
    Google ScholarLocate open access versionFindings
  • C. Pan and M. Zhu. Group additive structure identification for kernel nonparametric regression. In Advances in Neural Information Processing Systems (NIPS), pages 4907–4916. 2017.
    Google ScholarLocate open access versionFindings
  • E. Parzen. On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33(3):1065–1076, 1962.
    Google ScholarLocate open access versionFindings
  • G. Raskutti, M. J. Wainwright, and B. Yu. Minimax-optimal rates for sparse additive models over kernel classes via convex programming. Journal of Machine Learning Research, 13(2):389–427, 2012.
    Google ScholarLocate open access versionFindings
  • P. Ravikumar, H. Liu, J. Lafferty, and L. Wasserman. SpAM: sparse additive models. Journal of the Royal Statistical Society: Series B, 71:1009–1030, 2009.
    Google ScholarLocate open access versionFindings
  • S. J. Reddi, S. Sra, B. Poczos, and A. J. Smola. Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In Advances in Neural Information Processing Systems (NIPS), pages 1145–1153. 2016.
    Google ScholarLocate open access versionFindings
  • T. W. Sager. Estimation of a multivariate mode. Annals of Statistics, 6(4):802–812, 1978.
    Google ScholarLocate open access versionFindings
  • N. Shervashidze and F. Bach. Learning the structure for structured sparsity. IEEE Transactions on Signal Processing, 63(18):4894–4902, 2015.
    Google ScholarLocate open access versionFindings
  • N. Simon, J. H. Friedman, T. Hastie, and R. Tibshirani. A sparse-group lasso. Journal of Computational and Graphical Statistics, 22(2):231–245, 2013.
    Google ScholarLocate open access versionFindings
  • C. J. Stone. Additive regression and other nonparametric models. The Annals of Statistics, 13(2):689–705, 1985.
    Google ScholarLocate open access versionFindings
  • G. Swirszcz and A. C. Lozano. Multi-level lasso for sparse multi-task regression. In International Conference on Machine Learning (ICML), pages 595–602, 2012.
    Google ScholarLocate open access versionFindings
  • R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 73(3):267–288, 1994.
    Google ScholarLocate open access versionFindings
  • Q. Van Nguyen. Forward-backward splitting with bregman distances. Vietnam Journal of Mathematics, 45(3):519–539, 2017.
    Google ScholarLocate open access versionFindings
  • X. Wang, H. Chen, W. Cai, D. Shen, and H. Huang. Regularized modal regression with applications in cognitive impairment prediction. In Advances in Neural Information Processing Systems (NIPS), pages 1448–1458, 2017.
    Google ScholarLocate open access versionFindings
  • Y. Wang, J. Liu, Y. Jiang, and R. Erdélyi. Cme arrival time prediction using convolutional neural network. The Astrophysical Journal, 881(11):15, 2019.
    Google ScholarLocate open access versionFindings
  • Y. Wu and L. A. Stefanski. Automatic structure recovery for additive models. Biometrika, 102(2):381–395, 2015.
    Google ScholarLocate open access versionFindings
  • W. Yao and L. Li. A new regression model: modal linear regression. Scandinavian Journal of Statistics, 41(3):656–671, 2013.
    Google ScholarLocate open access versionFindings
  • J. Yin, X. Chen, and E. P. Xing. Group sparse additive models. In International Conference on Machine Learning (ICML), pages 871–878, 2012.
    Google ScholarLocate open access versionFindings
  • M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of The Royal Statistical Society Series B-statistical Methodology, 68(1):49–67, 2006.
    Google ScholarLocate open access versionFindings
  • M. Yuan and D. X. Zhou. Minimax optimal rates of estimation in high dimensional additive models. The Annals of Statistics, 44(6):2564–2593, 2016.
    Google ScholarLocate open access versionFindings
  • H. H. Zhang, G. Cheng, and Y. Liu. Linear or nonlinear? automatic structure discovery for partially linear models. Journal of the American Statistical Association, 106(495):1099–1112, 2011.
    Google ScholarLocate open access versionFindings
  • T. Zhao and H. Liu. Sparse additive machine. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1435–1443, 2012.
    Google ScholarLocate open access versionFindings
作者
Yingjie Wang
Yingjie Wang
Tieliang Gong
Tieliang Gong
Yanhong Chen
Yanhong Chen
您的评分 :
0

 

标签
评论
小科