## AI帮你理解科学

## AI 精读

AI抽取本论文的概要总结

微博一下：

# Towards Convergence Rate Analysis of Random Forests for Classification

NIPS 2020, (2020)

EI

摘要

Random forests have been one of the successful ensemble algorithms in machine learning. The basic idea is to construct a large number of random trees individually and make prediction based on an average of their predictions. The great successes have attracted much attention on the consistency of random forests, mostly focusing on regressi...更多

代码：

数据：

简介

- From the pioneer work [12], random forests have been recognized as one of the successful algorithms for classification and regression, which construct a large number of random trees individually and make prediction based on an average of their predictions.
- The authors first present the following relationship of convergence rate between random forests classifier and individual random tree classifier, and the detailed proof is given in Appendix A.
- Lemma 1 Let fm(x) be the random forests classifier given by Eqn (1), and fSn,Θ(x) denotes a classifier of individual tree with respect to random vector Θ.

重点内容

- From the pioneer work [12], random forests have been recognized as one of the successful algorithms for classification and regression, which construct a large number of random trees individually and make prediction based on an average of their predictions
- We present a convergence rate of pure random forests with midpoint splits for classification as follows: Theorem 2 Let fm(x) be the random forests classifier by applying pure random tree with midpoint splits to training data Sn of k leaves (k ≥ 2)
- Our work presents the convergence rates of random forests for classification based on different analysis techniques, and it is interesting to study the convergence rates of other variants of random forests along our analysis
- We present the first finite-sample convergence rate O(n−1/(8d+2)) for pure random forests, as well as a convergence rate O(n−1/(d+2)(ln n)1/(d+2)) for the simplified variant of Breiman’s original random forests [12], which reaches the minimax rate, except for a factor1/(d+2), of the optimal plug-in classifier under the L-Lipschitz assumption
- It is interesting to extend our work to multi-class learning, where the challenges lie in the theoretical analysis of predictions f (x, y) − maxi=y f (x, i) and Lipschitz assumptions over multiple class-conditional distributions

结果

- Theorem 1 Let fm(x) be the random forests classifier by applying pure random tree to training data Sn of k leaves (k ≥ 2).
- The authors obtain a convergence rate O(n−1/(8d+2)) of pure random forests for classification, by selecting leaves parameter k = O(n4d/(4d+1)).
- The authors first derive the convergence rate of individual random tree classifier fSn,Θ(x), and complete the proof by combining with Lemma 1.
- The authors present a convergence rate of pure random forests with midpoint splits for classification as follows: Theorem 2 Let fm(x) be the random forests classifier by applying pure random tree with midpoint splits to training data Sn of k leaves (k ≥ 2).
- The authors get a convergence rate O(n−1/(3.87d+2)) of pure random forests with midpoint splits for classification, by selecting leaves parameter k = O(n3.87d/(3.87d+2)).
- The authors present a convergence rate of the simplified variant of random forests for classification as follows: Theorem 3 For k ≥ 2 and n ≥ 4, let fm(x) be the random forests classifier by applying Algorithm 1 to training data Sn of k leaves.
- The authors' simplified variant of random forests reaches the minimax convergence rate, except for a factor1/(1+d), as that of the optimal plug-in classifiers, despite random forests are not plug-in classifiers, since random forests take a majority vote over the predictions of individual random trees, rather than the estimation of conditional probability.
- The authors achieve tighter convergence rate O( ln n/n) of the simplified variant of random forests for classification, which is independent of dimension d.

结论

- Mourtada et al [40] presented the consistency of online Mondrian forests classifiers according to [22, Theorem 6.1], and derived the minimax rate O(n−1/(d+2)) for plug-in classifiers based on the estimation of conditional probability, that is, they took an average of conditional probabilities calculated by individual Mondrian trees.
- The authors present the first finite-sample convergence rate O(n−1/(8d+2)) for pure random forests, as well as a convergence rate O(n−1/(d+2)(ln n)1/(d+2)) for the simplified variant of Breiman’s original random forests [12], which reaches the minimax rate, except for a factor1/(d+2), of the optimal plug-in classifier under the L-Lipschitz assumption.
- This is a pure theoretical work without particular application foreseen

总结

- From the pioneer work [12], random forests have been recognized as one of the successful algorithms for classification and regression, which construct a large number of random trees individually and make prediction based on an average of their predictions.
- The authors first present the following relationship of convergence rate between random forests classifier and individual random tree classifier, and the detailed proof is given in Appendix A.
- Lemma 1 Let fm(x) be the random forests classifier given by Eqn (1), and fSn,Θ(x) denotes a classifier of individual tree with respect to random vector Θ.
- Theorem 1 Let fm(x) be the random forests classifier by applying pure random tree to training data Sn of k leaves (k ≥ 2).
- The authors obtain a convergence rate O(n−1/(8d+2)) of pure random forests for classification, by selecting leaves parameter k = O(n4d/(4d+1)).
- The authors first derive the convergence rate of individual random tree classifier fSn,Θ(x), and complete the proof by combining with Lemma 1.
- The authors present a convergence rate of pure random forests with midpoint splits for classification as follows: Theorem 2 Let fm(x) be the random forests classifier by applying pure random tree with midpoint splits to training data Sn of k leaves (k ≥ 2).
- The authors get a convergence rate O(n−1/(3.87d+2)) of pure random forests with midpoint splits for classification, by selecting leaves parameter k = O(n3.87d/(3.87d+2)).
- The authors present a convergence rate of the simplified variant of random forests for classification as follows: Theorem 3 For k ≥ 2 and n ≥ 4, let fm(x) be the random forests classifier by applying Algorithm 1 to training data Sn of k leaves.
- The authors' simplified variant of random forests reaches the minimax convergence rate, except for a factor1/(1+d), as that of the optimal plug-in classifiers, despite random forests are not plug-in classifiers, since random forests take a majority vote over the predictions of individual random trees, rather than the estimation of conditional probability.
- The authors achieve tighter convergence rate O( ln n/n) of the simplified variant of random forests for classification, which is independent of dimension d.
- Mourtada et al [40] presented the consistency of online Mondrian forests classifiers according to [22, Theorem 6.1], and derived the minimax rate O(n−1/(d+2)) for plug-in classifiers based on the estimation of conditional probability, that is, they took an average of conditional probabilities calculated by individual Mondrian trees.
- The authors present the first finite-sample convergence rate O(n−1/(8d+2)) for pure random forests, as well as a convergence rate O(n−1/(d+2)(ln n)1/(d+2)) for the simplified variant of Breiman’s original random forests [12], which reaches the minimax rate, except for a factor1/(d+2), of the optimal plug-in classifier under the L-Lipschitz assumption.
- This is a pure theoretical work without particular application foreseen

相关工作

- For random forests, a large number of variants have been developed according to different problems and settings in the literature during the past decades. Geurts et al [27] introduced the extremely randomized trees and Amaratunga et al [1] provided the enriched random forests for DNA microarray data of huge features. Menze et al [38] presented the oblique random forests for multivariate trees by explicitly learning the optimal split directions with linear discriminative models. Clémençon et al [14] introduced the ranking forests based on aggregation and feature randomization principles for bipartite ranking. Athey et al [4] developed a flexible and computationally efficient algorithm for the generalized random forests. A general framework is presented in [53] on various splitting criteria for random forests based on loss functions. Zhou and Feng [55, 56] proposed gcForest with performance highly competitive to deep neural networks. Online random forests have also been developed with strong theoretical guarantees [19, 33, 40, 49].

基金

- This research was supported by the NSFC (61921006, 61876078), the Fundamental Research Funds for the Central Universities (14380003)

引用论文

- D. Amaratunga, J. Cabrera, and Y.-S. Lee. Enriched random forests. Bioinformatics, 24(18):2010–2014, 2008.
- Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural Computation, 9(7):1545–1588, 1997.
- S. Arlot and R. Genuer. Analysis of purely random forests bias. CoRR/Abstract, 1407.3939, 2014.
- S. Athey, J. Tibshirani, and S. Wager. Generalized random forests. Annals of Statistics, 47(2):1148–1178, 2019.
- J.-Y. Audibert and A. Tsybakov. Fast learning rates for plug-in classifiers. Annals of Statistics, 35(2):608–633, 2007.
- S. Basu, K. Kumbier, J. Brown, and B. Yu. Iterative random forests to discover predictive and stable high-order interactions. Proceedings of the National Academy of Sciences, 115(8):1943– 1948, 2018.
- M. Belgiu and L. Dragut. Random forest in remote sensing: A review of applications and future directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114:24–31, 2016.
- G. Biau. Analysis of a random forests model. Journal of Machine Learning Research, 13:1063– 1095, 2012.
- G. Biau, L. Devroye, and G. Lugosi. Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research, 9:2015–2033, 2008.
- G. Biau and E. Scornet. A random forest guided tour. Test, 25(2):197–227, 2016.
- L. Breiman. Some infinity theory for predictor ensembles. Technical Report 579, Statistics Department, UC Berkeley, Berkeley, CA, 2000.
- L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
- L. Breiman. Consistency for a simple model of random forests. Technical Report 670, Statistics Department, UC Berkeley, Berkeley, CA, 2004.
- S. Clémençon, M. Depecker, and N. Vayatis. Ranking forests. Journal of Machine Learning Research, 14:39–73, 2013.
- T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21–27, 1967.
- A. Criminisi and J. Shotton. Decision Forests for Computer Vision and Medical Image Analysis. Springer Science & Business Media, 2013.
- A. Criminisi, J. Shotton, and E. Konukoglu. Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Foundations and Trends in Computer Graphics and Vision, 7(2-3):81–227, 2012.
- D. Cutler, T. Edwards Jr, K. Beard, A. Cutler, K. Hess, J. Gibson, and J. Lawler. Random forests for classification in ecology. Ecology, 88(11):2783–2792, 2007.
- M. Denil, D. Matheson, and N. Freitas. Consistency of online random forests. In Proceedings of the 30th International Conference on Machine Learning, pages 1256–1264, Atlanta, GA, 2013.
- M. Denil, D. Matheson, and N. De Freitas. Narrowing the gap: Random forests in theory and in practice. In Proceedings of the 31th International Conference on Machine Learning, pages 665–673, Beijing, China, 2014.
- L. Devroye. A note on the height of binary search trees. Journal of the ACM, 33(3):489–498, 1986.
- L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, New York, 1996.
- T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2):139–157, 2000.
- M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15(1):3133–3181, 2014.
- R. Genuer. Variance reduction in purely random forests. Journal of Nonparametric Statistics, 24(3):543–562, 2012.
- R. Genuer, J. Poggi, and C. Tuleau. Random forests: Some methodological insights. CoRR/Abstract, 0811.3619, 2008.
- P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine Learning, 63(1):3– 42, 2006.
- J. Goetz, A. Tewari, and P. Zimmerman. Active learning for non-parametric regression using purely random trees. In Advances in Neural Information Processing Systems 31, pages 2537– 2546. MIT Press, Cambridge, MA, 2018.
- T. K. Ho. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8):832–844, 1998.
- J. Kazemitabar, A. Amini, A. Bloniarz, and A. Talwalkar. Mondrian forests: Efficient online random forests. In Advances in Neural Information Processing Systems 30, pages 426–435. MIT Press, Cambridge, MA, 2017.
- J. Klusowski. Sharp analysis of a simple model for random forests. CoRR/Abstract, 1805.02587, 2018.
- S. Kwok and C. Carter. Multiple decision trees. In Proceedings of the 4th Annual Conference on Uncertainty in Artificial Intelligence, pages 327–338, Minneapolis, MN, 1988.
- B. Lakshminarayanan, D. Roy, and Y. Teh. Mondrian forests: Efficient online random forests. In Advances in Neural Information Processing Systems 27, pages 3140–3148. MIT Press, Cambridge, MA, 2014.
- X. Li, Y. Wang, S. Basu, K. Kumbier, and B. Yu. A debiased MDI feature importance measure for random forests. In Advances in Neural Information Processing Systems 32, pages 8047–8057. MIT Press, Cambridge, MA, 2019.
- Y. Lin and Y. Jeon. Random forests and adaptive nearest neighbors. Journal of the American Statistical Association, 101(474):578–590, 2006.
- G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts. Understanding variable importances in forests of randomized trees. In Advances in Neural Information Processing Systems 26, pages 431–439. MIT Press, Cambridge, MA, 2013.
- N. Meinshausen. Quantile regression forests. Journal of Machine Learning Research, 7:983– 999, 2006.
- B. Menze, M. Kelm, D. Splitthoff, U. Koethe, and F. Hamprecht. On oblique random forests. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 453–469, Athens, Greece, 2011.
- M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.
- J. Mourtada, S. Gaïffas, and E. Scornet. Universal consistency and minimax rates for online mondrian forests. In Advances in Neural Information Processing Systems 30, pages 3758–3767. MIT Press, Cambridge, MA, 2017.
- [42] B. Reed. The height of a random binary search tree. Journal of the ACM, 50(3):306–332, 2003.
- [43] J. Rodriguez, L. Kuncheva, and C. Alonso. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619–1630, 2006.
- [44] E. Scornet. On the asymptotics of random forests. Journal of Multivariate Analysis, 146:72–83, 2016.
- [45] E. Scornet, G. Biau, and J. Vert. Consistency of random forests. Annals of Statistics, 43(4):1716– 1741, 2015.
- [46] S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge, 2014.
- [47] J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake, M. Cook, and R. Moore. Real-time human pose recognition in parts from single depth images. Communications of the ACM, 56(1):116–124, 2013.
- [48] V. Svetnik, A. Liaw, C. Tong, J. Culberson, R. Sheridan, and B. Feuston. Random forest: A classification and regression tool for compound classification and QSAR modeling. Journal of Chemical Information and Computer Sciences, 43(6):1947–1958, 2003.
- [49] M. Taddy, R. Gramacy, and N. Polson. Dynamic trees for learning and design. Journal of the American Statistical Association, 106(493):109–123, 2011.
- [50] C. Tang, D. Garreau, and U. von Luxburg. When do random forests fail? In Advances in Neural Information Processing Systems 31, pages 2983–2993. MIT Press, Cambridge, MA, 2018.
- [51] S. Wager. Asymptotic theory for random forests. CoRR/Abstract, 1405.0352, 2014.
- [52] S. Wager, T. Hastie, and B. Efron. Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. Journal of Machine Learning Research, 15(1):1625–1651, 2014.
- [53] B.-B. Yang, W. Gao, and M. Li. On the robust splitting criterion of random forest. In Proceedings of the 19th IEEE International Conference on Data Mining, pages 1420–1425, Beijing, China, 2019.
- [54] Y. Yang. Minimax nonparametric classification - part I: Rates of convergence. IEEE Transactions on Information Theory, 45(7):2271–2284, 1999.
- [55] Z.-H. Zhou and J. Feng. Deep forest: Towards an alternative to deep neural networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pages 3553– 3559, Melbourne, Australia, 2017.
- [56] Z.-H. Zhou and J. Feng. Deep forest. National Science Review, 6(1):74–86, 2019.

标签

评论