## AI 生成解读视频

AI抽取解析论文重点内容自动生成视频

AI解析本论文相关学术脉络

## AI 精读

AI抽取本论文的概要总结

We propose a simple extension of top-down decision tree learning heuristics such as ID3, C4.5, and CART

# Universal guarantees for decision tree induction via a higher-order splitting criterion

NIPS 2020, (2020)

EI

We propose a simple extension of top-down decision tree learning heuristics such as ID3, C4.5, and CART. Our algorithm achieves provable guarantees for all target functions $f: \{-1,1\}^n \to \{-1,1\}$ with respect to the uniform distribution, circumventing impossibility results showing that existing heuristics fare poorly even for simp...更多

0

• Decision trees have long been a workhorse of machine learning. Their simple hierarchical structure makes them appealing in terms of interpretability and explanatory power.
• The authors design a simple top-down decision tree learning algorithm based on the new splitting criterion (⋆), and show that it achieves provable guarantees for all target functions f : {±1}n → {±1}, i.e. a universal guarantee of the form (♦).
• Given a partial tree T ◦ and a target function f : {±1}n → {±1}, there is a canonical completion of T ◦ that minimizes classification error errorf (T ): label every leaf l of T ◦ with sign(E[fl]), where fl denotes the subfunction of f obtained by restricting its attributes according to the path that leads to l.

• Decision trees have long been a workhorse of machine learning
• We depart from the overall theme of prior works: instead of analyzing the performance of impuritybased heuristics when run on restricted classes of target functions, we propose a simple and efficient extension of these heuristics, and show that it achieves the sought-for universal guarantee (♦)
• We design a simple top-down decision tree learning algorithm based on our new splitting criterion (⋆), and show that it achieves provable guarantees for all target functions f : {±1}n → {±1}, i.e. a universal guarantee of the form (♦)
• Theorem 1 should be contrasted with the impossibility result discussed in the introduction, which shows that there are very simple target functions for which opt4 = 0, and yet any impuritybased heuristic may build a complete tree of size Ω(2n) (in time Ω(2n)) before achieving any non-trivial error
• We will show that Theorem 1 follows as a consequence of the following more general result, which gives a performance guarantee for BuildStabilizingDT that scales according to the noise sensitivity of the target function f
• We have designed a new top-down decision tree learning algorithm that achieves provable guarantees for all target functions f : {±1}n → {±1} with respect to the uniform distribution. This circumvents impossibility results showing that such a universal guarantee is provably unachievable by existing top-down heuristics such as ID3, C4.5, and CART

• Theorem 1 should be contrasted with the impossibility result discussed in the introduction, which shows that there are very simple target functions for which opt4 = 0, and yet any impuritybased heuristic may build a complete tree of size Ω(2n) (in time Ω(2n)) before achieving any non-trivial error
• Recent work of Blanc, Lange, and Tan [BLT20b] gives a top-down algorithm for learning decision trees that achieves provable guarantees for all target functions f .
• Fact 3.1 motivates the following question: given a partial decision tree T ◦ and a target function f , is there a leaf l⋆
• The authors will show that Theorem 1 follows as a consequence of the following more general result, which gives a performance guarantee for BuildStabilizingDT that scales according to the noise sensitivity of the target function f .
• T}, the authors write Tj◦ to denote the size-j partial decision tree that BuildStabilizingDT constructs after j − 1 iterations.
• The authors will use the potential function, the noise sensitivity of f with respect to a partial decision tree T ◦ (Definition 4), to bound this number.
• The authors have designed a new top-down decision tree learning algorithm that achieves provable guarantees for all target functions f : {±1}n → {±1} with respect to the uniform distribution.

• The analysis of the algorithm draws on Fourier-analytic techniques, and is based on a view of the top-down decision tree construction process as a noise stabilization procedure.
• The focus of the work has been on achieving a universal guarantee: the new algorithm constructs, for all target functions f and size parameters s, a decision tree hypothesis of quasipoly(s) size that achieves error close to that of the best size-s decision tree f .
• Are there broad and natural subclasses of target functions for which the quasipoly(s) bound on the size of the decision tree hypothesis can be improved to poly(s), or even O(s)?

• Guy, Jane, and Li-Yang were supported by NSF award CCF-192179 and NSF CAREER award CCF-1942123
• Neha was supported by NSF award 1704417

• [BDM19a] Alon Brutzkus, Amit Daniely, and Eran Malach. ID3 Learns Juntas for Smoothed Product Distributions. ArXiv, abs/1906.08654, 2019. 1
• [BDM19b] Alon Brutzkus, Amit Daniely, and Eran Malach. On the Optimality of Trees Generated by ID3. ArXiv, abs/1907.05444, 2019. 1
• [BEK02] Nader H Bshouty, Nadav Eiron, and Eyal Kushilevitz. Pac learning with nasty noise. Theoretical Computer Science, 288(2):255–275, 2002. 2
• Guy Blanc, Jane Lange, and Li-Yang Tan. Provable guarantees for decision tree induction: the agnostic setting. In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020. Available at https://arxiv.org/abs/2006.00743.1
• Guy Blanc, Jane Lange, and Li-Yang Tan. Top-down induction of decision trees: rigorous guarantees and inherent limitations. In Proceedings of the 11th Innovations in Theoretical Computer Science Conference (ITCS), volume 151, pages 1–44, 2020. 1, 1, 1.3
• Avrim Blum. Rank-r decision trees are a subclass of r-decision lists. Information Processing Letters, 42(4):183–185, 1992. 1.3
• [Bre17] Leo Breiman. Classification and regression trees. Routledge, 2011
• Sitan Chen and Ankur Moitra. Beyond the low-degree algorithm: mixtures of subcubes and their applications. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC), pages 869–880, 2019. 1.3
• [DKM96] Tom Dietterich, Michael Kearns, and Yishay Mansour. Applying the weak learning framework to understand and improve C4.5. In Proceedings of the 13th International Conference on Machine Learning (ICML), pages 96–104, 1996. 1
• Andrzej Ehrenfeucht and David Haussler. Learning decision trees from random examples. Information and Computation, 82(3):231–246, 1989. 1.3
• Bradley Efron and Charles Stein. The jackknife estimate of variance. The Annals of Statistics, pages 586–596, 1981. 4
• Amos Fiat and Dmitry Pechyony. Decision trees: More theoretical justification for practical algorithms. In Proceedings of the 15th International Conference on Algorithmic Learning Theory (ALT), pages 156–170, 2004. 1
• [GKK08] Parikshit Gopalan, Adam Kalai, and Adam Klivans. Agnostically learning decision trees. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC), pages 527–536, 2008. 1.3
• [HKY18] Elad Hazan, Adam Klivans, and Yang Yuan. Hyperparameter optimization: A spectral approach. Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018. 1.3
• Chris Jones. A noisy-influence regularity lemma for boolean functions. arXiv preprint, 2016. Available at https://arxiv.org/abs/1610.06950.3, 3
• Michael Kearns. Boosting theory towards practice: recent developments in decision tree induction and the weak learning framework (invited talk). In Proceedings of the 13th National Conference on Artificial intelligence (AAAI), pages 1337–1339, 1996. 1, 1.3
• Jeff Kahn, Gil Kalai, and Nathan Linial. The influence of variables on boolean functions. In Proceedings of the 29th Annual Symposium on Foundations of Computer Science (FOCS), pages 68–80, 1988. 4
• [KKMS08] Adam Kalai, Adam Klivans, Yishay Mansour, and Rocco A. Servedio. Agnostically learning halfspaces. SIAM Journal on Computing, 37(6):1777–1805, 2008. 3
• Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807–837, 1993. 2
• [KM93] Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the fourier spectrum. SIAM Journal on Computing, 22(6):1331–1348, December 1993. 1.3
• Michael Kearns and Yishay Mansour. On the boosting ability of top-down decision tree learning algorithms. Journal of Computer and System Sciences, 58(1):109–128, 1999. 1, 1, 1.3
• [KOS04] Adam Klivans, Ryan O’Donnell, and Rocco Servedio. Learning intersections and thresholds of halfspaces. Journal of Computer and System Sciences, 68(4):808–840, 2004. 3
• [O’D03] Ryan O’Donnell. Computational applications of noise sensitivity. PhD thesis, Massachusetts Institute of Technology, 2003. 3
• [O’D14] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014. Available at http://analysisofbooleanfunctions.net/.3, 1.2, 3, 5.1
• Ryan O’Donnell and Rocco Servedio. Learning monotone decision trees in polynomial time. SIAM Journal on Computing, 37(3):827–844, 2007. 1.3, 5.1
• [OSSS05] Ryan O’Donnell, Michael Saks, Oded Schramm, and Rocco Servedio. Every decision tree has an influential variable. In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 31–39, 2005. 4, 2, 4
• [OSTW10] Ryan O’Donnell, Rocco Servedio, Li-Yang Tan, and Andrew Wan. A regularity lemma for low noisy-influences, 2010. 3
• Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986. 1
• Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993. 1
• Michael Steele. An efron–stein inequality for nonsymmetric statistics. The Annals of Statistics, 14(2):753–758, 1986. 4
• Leslie Valiant. Learning disjunction of conjunctions. In Proceedings of the 9th International Joint Conference on Artificial Intelligence (IJCAI), pages 560–566, 1985. 2

Guy Blanc
Neha Gupta
Jane Lange

0