Piecewise Linear Regression via a Difference of Convex Functions

Ali Siahkamari
Ali Siahkamari
Aditya Gangrade
Aditya Gangrade

ICML, pp. 8895-8904, 2020.

Cited by: 0|Views33
EI
Weibo:
Our model results in linear or convex programs which can be solved efficiently even in high dimensions

Abstract:

We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data. These are functions $f$ that may be represented as the difference $\phi_1 - \phi_2$ for a choice of convex functions $\phi_1, \phi_2$. The method proceeds by estimating piecewise-liner convex functi...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • The multivariate nonparametric regression problem is a fundamental statistical primitive, with a vast history and many approaches.
  • Piecewise linear regression methods typically involve a prespecified grid, and the number of grid points, or knots, grows exponentially with dimension.
  • Methods like splines typically require both stronger smoothness guarantees and exponentially more parameters to fit with dimension in order to avoid singularities in the estimate
Highlights
  • The multivariate nonparametric regression problem is a fundamental statistical primitive, with a vast history and many approaches
  • In order to contextualise the expressiveness of DC functions, we argue that the popular parametric class of ReLU neural networks can be represented by PLDC functions, and vice versa
  • We reduce the problem by instead finding the values that a best fit DC function must take at the datapoint xi, and fitting a PLDC functions with convex parts that are max-affine over precisely n linear functionals on this
  • Our model results in linear or convex programs which can be solved efficiently even in high dimensions
Methods
  • The authors apply our method to both synthetic and real datasets for regression and multi-class classification.
  • For the DC function fitting procedure, the authors note that that the theoretical value for the regularization weight tends to oversmooth the estimators
  • This behaviour is expected since the bound (4) is designed for the worst case.
  • Fig. 1 presents both these estimates in a simple setting where one can visually observe the improved fit
  • Note that this tuning is still minimal - the empirical discrepancy of DC1 fixes a rough upper bound on the λ necessary, and the authors explore only a few different scales
Conclusion
  • The paper proposes an algorithm to learn piecewise linear functions using difference of max-affine functions.
  • Wider context Non-parametric methods are most often utilised in settings with limited data in moderate dimensions.
  • Within this context, along with strong accuracy, it is often desired that the method is fast, and is interpretable, especially in relatively large dimensions.
  • In settings with low dimensionality and small datasets, interpretability and speed take a backseat due to the small number of features, while accuracy become the critical requirement, promoting use of kernelised or nearest neighbour methods
Summary
  • Introduction:

    The multivariate nonparametric regression problem is a fundamental statistical primitive, with a vast history and many approaches.
  • Piecewise linear regression methods typically involve a prespecified grid, and the number of grid points, or knots, grows exponentially with dimension.
  • Methods like splines typically require both stronger smoothness guarantees and exponentially more parameters to fit with dimension in order to avoid singularities in the estimate
  • Methods:

    The authors apply our method to both synthetic and real datasets for regression and multi-class classification.
  • For the DC function fitting procedure, the authors note that that the theoretical value for the regularization weight tends to oversmooth the estimators
  • This behaviour is expected since the bound (4) is designed for the worst case.
  • Fig. 1 presents both these estimates in a simple setting where one can visually observe the improved fit
  • Note that this tuning is still minimal - the empirical discrepancy of DC1 fixes a rough upper bound on the λ necessary, and the authors explore only a few different scales
  • Conclusion:

    The paper proposes an algorithm to learn piecewise linear functions using difference of max-affine functions.
  • Wider context Non-parametric methods are most often utilised in settings with limited data in moderate dimensions.
  • Within this context, along with strong accuracy, it is often desired that the method is fast, and is interpretable, especially in relatively large dimensions.
  • In settings with low dimensionality and small datasets, interpretability and speed take a backseat due to the small number of features, while accuracy become the critical requirement, promoting use of kernelised or nearest neighbour methods
Tables
  • Table1: Datasets used for the regression task. The entries in the first columns are linked to repositry copies of the same. The final column indicates if the DC function based method outperforms all competing methods or not
  • Table2: Datasets used for the multi-class classification task. The entries in the first columns are linked to the UCI machine learning repositry copies of the same. The final column indicates if the DC function based method outperforms all competing methods or not
Download tables as Excel
Reference
  • Bacak, M. and Borwein, J. M. On difference convexity of locally lipschitz functions. Optimization, 60(8-9):961– 978, 2011.
    Google ScholarLocate open access versionFindings
  • Balazs, G. Convex Regression: Theory, Practice, and Applications. PhD thesis, University of Alberta, 2016.
    Google ScholarFindings
  • Balazs, G., Gyorgy, A., and Szepesvari, C. Near-optimal max-affine estimators for convex regression. In AISTATS, 2015.
    Google ScholarLocate open access versionFindings
  • Bartlett, P. L. and Mendelson, S. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
    Google ScholarLocate open access versionFindings
  • Boyd, S. and Vandenberghe, L. Convex optimization. Cambridge university press, 2004.
    Google ScholarFindings
  • Boyd, S., Parikh, N., and Chu, E. Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, 2011.
    Google ScholarFindings
  • Friedman, J. H. et al. Multivariate adaptive regression splines. The annals of statistics, 19(1):1–67, 1991.
    Google ScholarLocate open access versionFindings
  • Gasso, G., Rakotomamonjy, A., and Canu, S. Recovering sparse signals with a certain family of nonconvex penalties and dc programming. IEEE Transactions on Signal Processing, 57(12):4686–4698, 2009.
    Google ScholarLocate open access versionFindings
  • Ghosh, A., Pananjady, A., Guntuboyina, A., and Ramchandran, K. Max-affine regression: Provable, tractable, and near-optimal statistical estimation. arXiv preprint arXiv:1906.09255, 2019.
    Findings
  • Gyorfi, L., Kohler, M., Krzyzak, A., and Walk, H. A distribution-free theory of nonparametric regression. Springer Science & Business Media, 2006.
    Google ScholarFindings
  • Hannah, L. A. and Dunson, D. B. Multivariate convex regression with adaptive partitioning. The Journal of Machine Learning Research, 14(1):3261–3294, 2013.
    Google ScholarLocate open access versionFindings
  • Hartman, P. et al. On functions representable as a difference of convex functions. Pacific Journal of Mathematics, 9 (3):707–713, 1959.
    Google ScholarLocate open access versionFindings
  • Hildreth, C. Point estimates of ordinates of concave functions. Journal of the American Statistical Association, 49(267):598–619, 1954.
    Google ScholarLocate open access versionFindings
  • Holloway, C. A. On the estimation of convex functions. Operations Research, 27(2):401–407, 1979.
    Google ScholarLocate open access versionFindings
  • Holmes, C. and Mallick, B. Bayesian regression with multivariate linear splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(1):3–17, 2001.
    Google ScholarLocate open access versionFindings
  • Horst, R. and Thoai, N. V. Dc programming: overview. Journal of Optimization Theory and Applications, 103 (1):1–43, 1999.
    Google ScholarLocate open access versionFindings
  • Kripfganz, A. and Schulze, R. Piecewise affine functions as a difference of two convex functions. Optimization, 18(1):23–29, 1987.
    Google ScholarLocate open access versionFindings
  • Le Thi, H. A., Le, H. M., Dinh, T. P., et al. A dc programming approach for feature selection in support vector machines learning. Advances in Data Analysis and Classification, 2(3):259–278, 2008.
    Google ScholarLocate open access versionFindings
  • Magnani, A. and Boyd, S. P. Convex piecewise-linear fitting. Optimization and Engineering, 10(1):1–17, 2009.
    Google ScholarLocate open access versionFindings
  • Ovchinnikov, S. Max-min representation of piecewise linear functions. Contributions to Algebra and Geometry, 43(1):297–302, 2002.
    Google ScholarLocate open access versionFindings
  • Piot, B., Geist, M., and Pietquin, O. Difference of convex functions programming for reinforcement learning. In Advances in Neural Information Processing Systems, pp. 2519–2527, 2014.
    Google ScholarLocate open access versionFindings
  • Shalev-Shwartz, S. and Ben-David, S. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
    Google ScholarFindings
  • In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., and Weinberger, K. Q. (eds.), Advances in Neural Information Processing Systems 27, pp. 2321–2329. Curran Associates, Inc., 2014.
    Google ScholarLocate open access versionFindings
  • Toriello, A. and Vielma, J. P. Fitting piecewise linear continuous functions. European Journal of Operational Research, 219(1):86–95, 2012.
    Google ScholarLocate open access versionFindings
  • Yuille, A. L. and Rangarajan, A. The concave-convex procedure (cccp). In Advances in neural information processing systems, pp. 1033–1040, 2002.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments