# Piecewise Linear Regression via a Difference of Convex Functions

Ali Siahkamari

ICML, pp. 8895-8904, 2020.

Cited by: 0|Views33
EI
Weibo:
Our model results in linear or convex programs which can be solved efficiently even in high dimensions

Abstract:

We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data. These are functions $f$ that may be represented as the difference $\phi_1 - \phi_2$ for a choice of convex functions $\phi_1, \phi_2$. The method proceeds by estimating piecewise-liner convex functi...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
• The multivariate nonparametric regression problem is a fundamental statistical primitive, with a vast history and many approaches.
• Piecewise linear regression methods typically involve a prespecified grid, and the number of grid points, or knots, grows exponentially with dimension.
• Methods like splines typically require both stronger smoothness guarantees and exponentially more parameters to fit with dimension in order to avoid singularities in the estimate
Highlights
• The multivariate nonparametric regression problem is a fundamental statistical primitive, with a vast history and many approaches
• In order to contextualise the expressiveness of DC functions, we argue that the popular parametric class of ReLU neural networks can be represented by PLDC functions, and vice versa
• We reduce the problem by instead finding the values that a best fit DC function must take at the datapoint xi, and fitting a PLDC functions with convex parts that are max-affine over precisely n linear functionals on this
• Our model results in linear or convex programs which can be solved efficiently even in high dimensions
Methods
• The authors apply our method to both synthetic and real datasets for regression and multi-class classification.
• For the DC function fitting procedure, the authors note that that the theoretical value for the regularization weight tends to oversmooth the estimators
• This behaviour is expected since the bound (4) is designed for the worst case.
• Fig. 1 presents both these estimates in a simple setting where one can visually observe the improved fit
• Note that this tuning is still minimal - the empirical discrepancy of DC1 fixes a rough upper bound on the λ necessary, and the authors explore only a few different scales
Conclusion
• The paper proposes an algorithm to learn piecewise linear functions using difference of max-affine functions.
• Wider context Non-parametric methods are most often utilised in settings with limited data in moderate dimensions.
• Within this context, along with strong accuracy, it is often desired that the method is fast, and is interpretable, especially in relatively large dimensions.
• In settings with low dimensionality and small datasets, interpretability and speed take a backseat due to the small number of features, while accuracy become the critical requirement, promoting use of kernelised or nearest neighbour methods
Summary
• ## Introduction:

The multivariate nonparametric regression problem is a fundamental statistical primitive, with a vast history and many approaches.
• Piecewise linear regression methods typically involve a prespecified grid, and the number of grid points, or knots, grows exponentially with dimension.
• Methods like splines typically require both stronger smoothness guarantees and exponentially more parameters to fit with dimension in order to avoid singularities in the estimate
• ## Methods:

The authors apply our method to both synthetic and real datasets for regression and multi-class classification.
• For the DC function fitting procedure, the authors note that that the theoretical value for the regularization weight tends to oversmooth the estimators
• This behaviour is expected since the bound (4) is designed for the worst case.
• Fig. 1 presents both these estimates in a simple setting where one can visually observe the improved fit
• Note that this tuning is still minimal - the empirical discrepancy of DC1 fixes a rough upper bound on the λ necessary, and the authors explore only a few different scales
• ## Conclusion:

The paper proposes an algorithm to learn piecewise linear functions using difference of max-affine functions.
• Wider context Non-parametric methods are most often utilised in settings with limited data in moderate dimensions.
• Within this context, along with strong accuracy, it is often desired that the method is fast, and is interpretable, especially in relatively large dimensions.
• In settings with low dimensionality and small datasets, interpretability and speed take a backseat due to the small number of features, while accuracy become the critical requirement, promoting use of kernelised or nearest neighbour methods
Tables
• Table1: Datasets used for the regression task. The entries in the first columns are linked to repositry copies of the same. The final column indicates if the DC function based method outperforms all competing methods or not
• Table2: Datasets used for the multi-class classification task. The entries in the first columns are linked to the UCI machine learning repositry copies of the same. The final column indicates if the DC function based method outperforms all competing methods or not
Reference
• Bacak, M. and Borwein, J. M. On difference convexity of locally lipschitz functions. Optimization, 60(8-9):961– 978, 2011.
• Balazs, G. Convex Regression: Theory, Practice, and Applications. PhD thesis, University of Alberta, 2016.
• Balazs, G., Gyorgy, A., and Szepesvari, C. Near-optimal max-affine estimators for convex regression. In AISTATS, 2015.
• Bartlett, P. L. and Mendelson, S. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
• Boyd, S. and Vandenberghe, L. Convex optimization. Cambridge university press, 2004.
• Boyd, S., Parikh, N., and Chu, E. Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, 2011.
• Friedman, J. H. et al. Multivariate adaptive regression splines. The annals of statistics, 19(1):1–67, 1991.
• Gasso, G., Rakotomamonjy, A., and Canu, S. Recovering sparse signals with a certain family of nonconvex penalties and dc programming. IEEE Transactions on Signal Processing, 57(12):4686–4698, 2009.
• Ghosh, A., Pananjady, A., Guntuboyina, A., and Ramchandran, K. Max-affine regression: Provable, tractable, and near-optimal statistical estimation. arXiv preprint arXiv:1906.09255, 2019.
• Gyorfi, L., Kohler, M., Krzyzak, A., and Walk, H. A distribution-free theory of nonparametric regression. Springer Science & Business Media, 2006.
• Hannah, L. A. and Dunson, D. B. Multivariate convex regression with adaptive partitioning. The Journal of Machine Learning Research, 14(1):3261–3294, 2013.
• Hartman, P. et al. On functions representable as a difference of convex functions. Pacific Journal of Mathematics, 9 (3):707–713, 1959.
• Hildreth, C. Point estimates of ordinates of concave functions. Journal of the American Statistical Association, 49(267):598–619, 1954.
• Holloway, C. A. On the estimation of convex functions. Operations Research, 27(2):401–407, 1979.
• Holmes, C. and Mallick, B. Bayesian regression with multivariate linear splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(1):3–17, 2001.
• Horst, R. and Thoai, N. V. Dc programming: overview. Journal of Optimization Theory and Applications, 103 (1):1–43, 1999.
• Kripfganz, A. and Schulze, R. Piecewise affine functions as a difference of two convex functions. Optimization, 18(1):23–29, 1987.
• Le Thi, H. A., Le, H. M., Dinh, T. P., et al. A dc programming approach for feature selection in support vector machines learning. Advances in Data Analysis and Classification, 2(3):259–278, 2008.
• Magnani, A. and Boyd, S. P. Convex piecewise-linear fitting. Optimization and Engineering, 10(1):1–17, 2009.
• Ovchinnikov, S. Max-min representation of piecewise linear functions. Contributions to Algebra and Geometry, 43(1):297–302, 2002.
• Piot, B., Geist, M., and Pietquin, O. Difference of convex functions programming for reinforcement learning. In Advances in Neural Information Processing Systems, pp. 2519–2527, 2014.
• Shalev-Shwartz, S. and Ben-David, S. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
• In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., and Weinberger, K. Q. (eds.), Advances in Neural Information Processing Systems 27, pp. 2321–2329. Curran Associates, Inc., 2014.
• Toriello, A. and Vielma, J. P. Fitting piecewise linear continuous functions. European Journal of Operational Research, 219(1):86–95, 2012.
• Yuille, A. L. and Rangarajan, A. The concave-convex procedure (cccp). In Advances in neural information processing systems, pp. 1033–1040, 2002.