## AI帮你理解科学

## AI 精读

AI抽取本论文的概要总结

微博一下：

# Multi-task Additive Models for Robust Estimation and Automatic Structure Discovery

NIPS 2020, (2020)

EI

关键词

摘要

Additive models have attracted much attention for high-dimensional regression estimation and variable selection. However, the existing models are usually limited to the single-task learning framework under the mean squared error (MSE) criterion, where the utilization of variable structure depends heavily on a priori knowledge among variab...更多

代码：

数据：

简介

- Additive models [14], as nonparametric extension of linear models, have been extensively investigated in machine learning literatures [1, 5, 34, 44].
- Typical additive models are usually formulated under Tikhonov regularization schemes and fall into two categories: one focuses on recognizing dominant variables without considering the interaction among the variables [21, 28, 29, 46] and the other aims to screen informative variables at the group level, e.g., groupwise additive models [4, 42].
- The groupwise additive models depend heavily on a priori knowledge of variable structure.
- The authors consider a problem commonly encountered in multi-task learning, in which all tasks share an underlying variable structure and involve data with complex non-Gaussian noises, e.g., skewed

重点内容

- Additive models [14], as nonparametric extension of linear models, have been extensively investigated in machine learning literatures [1, 5, 34, 44]
- We consider a problem commonly encountered in multi-task learning, in which all tasks share an underlying variable structure and involve data with complex non-Gaussian noises, e.g., skewed
- To relax the dependence on a prior structure and Gaussian noise, this paper proposes a class of Multi-task Additive Models (MAM) by integrating additive hypothesis space, mode-induced metric [6, 41, 10], and structure-based regularizer [12] into a bilevel learning framework
- This paper proposes the multi-task additive models to achieve robust estimation and automatic structure discovery
- As far as we know, it is novel to explore robust interpretable machine learning by integrating modal regression, additive models and multi-task learning together
- 2) Our MAM can be applied to other fields, e.g, gene expression analysis and drug discovery

方法

- MAM BiGL mGAM GL Lasso RMR.
- |V| = 0 (Gaussian noise) ASE TD WPI (SCP).
- |V| = 5 (Gaussian noise).
- |V| = 5 (Student noise).
- AAE (h) Groups AAE Groups AAE Groups AAE Groups AAE Groups.
- MAM 9.07 G1,G2,G7 45.41 G1,G2,G3,G6 59.32 G1 ,G2 G1,G2,G3 3.47 BiGL 11.09 Lasso 12.16 RMR 12.02

结果

**Evaluation Criterion Objective Function Robust Estimation**.- Sparsity on Grouped Features Sparsity on Individual Features Variable Structure Discovery RMR [38] Linear.
- GroupSpAM [42] Additive.
- Single-task Mean-induced Convex × × ×.
- CGSI[26] Additive Single-task Mean-induced Convex BIGL[12] Linear.
- Denote f(t), f ∗(t) as the estimator and ground truth function respectively, ≤ t.
**Evaluation criterions used here include**.- True Deviation (TD)= f(t) − f ∗(t) Variable Structure Recovery

结论

- This paper proposes the multi-task additive models to achieve robust estimation and automatic structure discovery.
- It is interesting to investigate robust additive models for overlapping variable structure discovery [17].
- The positive impacts of this work are two-fold: 1) The authors' algorithmic framework paves a new way for mining the intrinsic feature structure among high-dimensional variables, and may be the stepping stone to further explore data-driven structure discovery with overlapping groups.
- There is a risk of resulting an unstable estimation when facing ultra high-dimensional data

总结

## Introduction:

Additive models [14], as nonparametric extension of linear models, have been extensively investigated in machine learning literatures [1, 5, 34, 44].- Typical additive models are usually formulated under Tikhonov regularization schemes and fall into two categories: one focuses on recognizing dominant variables without considering the interaction among the variables [21, 28, 29, 46] and the other aims to screen informative variables at the group level, e.g., groupwise additive models [4, 42].
- The groupwise additive models depend heavily on a priori knowledge of variable structure.
- The authors consider a problem commonly encountered in multi-task learning, in which all tasks share an underlying variable structure and involve data with complex non-Gaussian noises, e.g., skewed
## Methods:

MAM BiGL mGAM GL Lasso RMR.- |V| = 0 (Gaussian noise) ASE TD WPI (SCP).
- |V| = 5 (Gaussian noise).
- |V| = 5 (Student noise).
- AAE (h) Groups AAE Groups AAE Groups AAE Groups AAE Groups.
- MAM 9.07 G1,G2,G7 45.41 G1,G2,G3,G6 59.32 G1 ,G2 G1,G2,G3 3.47 BiGL 11.09 Lasso 12.16 RMR 12.02
## Results:

**Evaluation Criterion Objective Function Robust Estimation**.- Sparsity on Grouped Features Sparsity on Individual Features Variable Structure Discovery RMR [38] Linear.
- GroupSpAM [42] Additive.
- Single-task Mean-induced Convex × × ×.
- CGSI[26] Additive Single-task Mean-induced Convex BIGL[12] Linear.
- Denote f(t), f ∗(t) as the estimator and ground truth function respectively, ≤ t.
**Evaluation criterions used here include**.- True Deviation (TD)= f(t) − f ∗(t) Variable Structure Recovery
## Conclusion:

This paper proposes the multi-task additive models to achieve robust estimation and automatic structure discovery.- It is interesting to investigate robust additive models for overlapping variable structure discovery [17].
- The positive impacts of this work are two-fold: 1) The authors' algorithmic framework paves a new way for mining the intrinsic feature structure among high-dimensional variables, and may be the stepping stone to further explore data-driven structure discovery with overlapping groups.
- There is a risk of resulting an unstable estimation when facing ultra high-dimensional data

- Table1: Algorithmic properties ( -has the given information, ×-hasn’t the given information)
- Table2: Performance comparisons on Example A (top) and Example B (bottom) w.r.t different criterions
- Table3: Average absolute error and dominant group for each task

相关工作

- There are some works for automatic structure discovery in additive models [26, 40] and partially linear models [19, 45]. Different from our MAM, these approaches are formulated under single-task framework and the MSE criterion, which are sensitive to non-Gaussian noise and difficult to tackle multi-task structure discovery directly. While some mode-based approaches have been designed for robust estimation, e.g., regularized modal regression (RMR) [38], none of them consider the automatic structure discovery. Recently, an extension of group lasso is formulated for variable structure discovery [12]. Although this approach can induce the data-driven sparsity at the group level, it is limited to the linear mean regression and ignores the sparsity with respect to individual features. To better highlight the novelty of MAM, its algorithmic properties are summarized in Table 1, compared with RMR [38], Group Sparse Additive Models (GroupSpAM) [42], Capacity-based group structure identification (CGSI)[26], and Bilevel learning of Group Lasso (BiGL) [12].

基金

- This work was supported by National Natural Science Foundation of China (NSFC) 11671161, 12071166, 61972188 and 41574181, NSERC grant RGPIN-2016-05024

研究对象与分析

ICMEs observations: 137

Interplanetary CMEs (ICMEs) data are provided in The Richardson and Cane List (http://www.srl.caltech.edu/ACE/ASC/DATA/level3/ icmetable2.htm). From this link, we collect 137 ICMEs observations from 1996 to 2016. The features of CMEs are provided in SOHO LASCO CME Catalog (https://cdaw.gsfc. nasa.gov/CME_list/)

引用论文

- R. Agarwal, N. Frosst, X. Zhang, R. Caruana, and G. E. Hinton. Neural additive models: Interpretable machine learning with neural nets. arXiv:2004.13912v1, 2020.
- R. Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.
- H. Chen, G. Liu, and H. Huang. Sparse shrunk additive models. In International Conference on Machine Learning (ICML), 2020.
- H. Chen, X. Wang, C. Deng, and H. Huang. Group sparse additive machine. In Advances in Neural Information Processing Systems (NIPS), pages 198–208. 2017.
- H. Chen, Y. Wang, F. Zheng, C. Deng, and H. Huang. Sparse modal additive model. IEEE Transactions on Neural Networks and Learning Systems, Doi: 10.1109/TNNLS.2020.3005144, 2020.
- Y. C. Chen, C. R. Genovese, R. J. Tibshirani, and L. Wasserman. Nonparametric modal regression. The Annals of Statistics, 44(2):489–514, 2016.
- B. Colson, P. Marcotte, and G. Savard. An overview of bilevel optimization. Annals of Operations Research, 153(1):235–256, 2007.
- L. Condat. Fast projection onto the simplex and the 1-ball. Mathematical Programming, 158(1-2):575–585, 2016.
- T. Evgeniou and M. Pontil. Regularized multi–task learning. In ACM Conference on Knowledge Discovery and Data Mining, pages 109–117, 2004.
- Y. Feng, J. Fan, and J. Suykens. A statistical learning approach to modal regression. Journal of Machine Learning Research, 21(2):1–35, 2020.
- L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, and M. Pontil. Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine learning (ICML), pages 1563–1572, 2018.
- J. Frecon, S. Salzo, and M. Pontil. Bilevel learning of the group lasso structure. In Advances in Neural Information Processing Systems (NIPS), pages 8301–8311, 2018.
- J. H. Friedman, T. Hastie, and R. Tibshirani. A note on the group lasso and a sparse group lasso. arXiv: 1001.0736, 2010.
- T. J. Hastie and R. J. Tibshirani. Generalized additive models. London: Chapman and Hall, 1990.
- C. Heinrich. The mode functional is not elicitable. Biometrika, 101(1):245–251, 2014.
- J. Huang, J. L. Horowitz, and F. Wei. Variable selection in nonparametric additive models. The Annals of Statistics, 38(4):2282–2313, 2010.
- L. Jacob, G. Obozinski, and J. Vert. Group lasso with overlap and graph lasso. In International Conference on Machine Learning (ICML), pages 433–440, 2009.
- K. Kandasamy and Y. Yu. Additive approximations in high dimensional nonparametric regression via the SALSA. In International Conference on Machine Learning (ICML), pages 69–78, 2016.
- X. Li, L. Wang, and D. Nettleton. Sparse model identification and learning for ultra-highdimensional additive partially models. Journal of Multivariate Analysis, 173:204–228, 2019.
- J. Liu, Y. Ye, C. Shen, Y. Wang, and R. Erdelyi. A new tool for cme arrival time prediction using machine learning algorithms: Cat-puma. The Astrophysical Journal, 855(2):109–118, 2018.
- S. Lv, H. Lin, H. Lian, and J. Huang. Oracle inequalities for sparse additive quantile regression in reproducing kernel hilbert space. The Annals of Statistics, 46(2):781–813, 2018.
- L. Meier, S. V. De Geer, and P. Buhlmann. The group lasso for logistic regression. Journal of The Royal Statistical Society Series B-statistical Methodology, 70(1):53–71, 2008.
- L. Meier, S. V. D. Geer, and P. Buhlmann. High-dimensional additive modeling. The Annals of Statistics, 37(6B):3779–3821, 2009.
- M. Nikolova and M. K. Ng. Analysis of half-quadratic minimization methods for signal and image recovery. SIAM Journal on Scientific Computing, 27(3):937–966, 2005.
- P. Ochs, R. Ranftl, T. Brox, and T. Pock. Techniques for gradient based bilevel optimization with nonsmooth lower level problems. Journal of Mathematical Imaging and Vision, 56(2):175–194, 2016.
- C. Pan and M. Zhu. Group additive structure identification for kernel nonparametric regression. In Advances in Neural Information Processing Systems (NIPS), pages 4907–4916. 2017.
- E. Parzen. On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33(3):1065–1076, 1962.
- G. Raskutti, M. J. Wainwright, and B. Yu. Minimax-optimal rates for sparse additive models over kernel classes via convex programming. Journal of Machine Learning Research, 13(2):389–427, 2012.
- P. Ravikumar, H. Liu, J. Lafferty, and L. Wasserman. SpAM: sparse additive models. Journal of the Royal Statistical Society: Series B, 71:1009–1030, 2009.
- S. J. Reddi, S. Sra, B. Poczos, and A. J. Smola. Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In Advances in Neural Information Processing Systems (NIPS), pages 1145–1153. 2016.
- T. W. Sager. Estimation of a multivariate mode. Annals of Statistics, 6(4):802–812, 1978.
- N. Shervashidze and F. Bach. Learning the structure for structured sparsity. IEEE Transactions on Signal Processing, 63(18):4894–4902, 2015.
- N. Simon, J. H. Friedman, T. Hastie, and R. Tibshirani. A sparse-group lasso. Journal of Computational and Graphical Statistics, 22(2):231–245, 2013.
- C. J. Stone. Additive regression and other nonparametric models. The Annals of Statistics, 13(2):689–705, 1985.
- G. Swirszcz and A. C. Lozano. Multi-level lasso for sparse multi-task regression. In International Conference on Machine Learning (ICML), pages 595–602, 2012.
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 73(3):267–288, 1994.
- Q. Van Nguyen. Forward-backward splitting with bregman distances. Vietnam Journal of Mathematics, 45(3):519–539, 2017.
- X. Wang, H. Chen, W. Cai, D. Shen, and H. Huang. Regularized modal regression with applications in cognitive impairment prediction. In Advances in Neural Information Processing Systems (NIPS), pages 1448–1458, 2017.
- Y. Wang, J. Liu, Y. Jiang, and R. Erdélyi. Cme arrival time prediction using convolutional neural network. The Astrophysical Journal, 881(11):15, 2019.
- Y. Wu and L. A. Stefanski. Automatic structure recovery for additive models. Biometrika, 102(2):381–395, 2015.
- W. Yao and L. Li. A new regression model: modal linear regression. Scandinavian Journal of Statistics, 41(3):656–671, 2013.
- J. Yin, X. Chen, and E. P. Xing. Group sparse additive models. In International Conference on Machine Learning (ICML), pages 871–878, 2012.
- M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of The Royal Statistical Society Series B-statistical Methodology, 68(1):49–67, 2006.
- M. Yuan and D. X. Zhou. Minimax optimal rates of estimation in high dimensional additive models. The Annals of Statistics, 44(6):2564–2593, 2016.
- H. H. Zhang, G. Cheng, and Y. Liu. Linear or nonlinear? automatic structure discovery for partially linear models. Journal of the American Statistical Association, 106(495):1099–1112, 2011.
- T. Zhao and H. Liu. Sparse additive machine. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1435–1443, 2012.

标签

评论