AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We propose a simple extension of top-down decision tree learning heuristics such as ID3, C4.5, and CART

Universal guarantees for decision tree induction via a higher-order splitting criterion

NIPS 2020, (2020)

被引用0|浏览29
EI
下载 PDF 全文
引用
微博一下

摘要

We propose a simple extension of top-down decision tree learning heuristics such as ID3, C4.5, and CART. Our algorithm achieves provable guarantees for all target functions $f: \{-1,1\}^n \to \{-1,1\}$ with respect to the uniform distribution, circumventing impossibility results showing that existing heuristics fare poorly even for simp...更多

代码

数据

0
简介
  • Decision trees have long been a workhorse of machine learning. Their simple hierarchical structure makes them appealing in terms of interpretability and explanatory power.
  • The authors design a simple top-down decision tree learning algorithm based on the new splitting criterion (⋆), and show that it achieves provable guarantees for all target functions f : {±1}n → {±1}, i.e. a universal guarantee of the form (♦).
  • Given a partial tree T ◦ and a target function f : {±1}n → {±1}, there is a canonical completion of T ◦ that minimizes classification error errorf (T ): label every leaf l of T ◦ with sign(E[fl]), where fl denotes the subfunction of f obtained by restricting its attributes according to the path that leads to l.
重点内容
  • Decision trees have long been a workhorse of machine learning
  • We depart from the overall theme of prior works: instead of analyzing the performance of impuritybased heuristics when run on restricted classes of target functions, we propose a simple and efficient extension of these heuristics, and show that it achieves the sought-for universal guarantee (♦)
  • We design a simple top-down decision tree learning algorithm based on our new splitting criterion (⋆), and show that it achieves provable guarantees for all target functions f : {±1}n → {±1}, i.e. a universal guarantee of the form (♦)
  • Theorem 1 should be contrasted with the impossibility result discussed in the introduction, which shows that there are very simple target functions for which opt4 = 0, and yet any impuritybased heuristic may build a complete tree of size Ω(2n) (in time Ω(2n)) before achieving any non-trivial error
  • We will show that Theorem 1 follows as a consequence of the following more general result, which gives a performance guarantee for BuildStabilizingDT that scales according to the noise sensitivity of the target function f
  • We have designed a new top-down decision tree learning algorithm that achieves provable guarantees for all target functions f : {±1}n → {±1} with respect to the uniform distribution. This circumvents impossibility results showing that such a universal guarantee is provably unachievable by existing top-down heuristics such as ID3, C4.5, and CART
结果
  • Theorem 1 should be contrasted with the impossibility result discussed in the introduction, which shows that there are very simple target functions for which opt4 = 0, and yet any impuritybased heuristic may build a complete tree of size Ω(2n) (in time Ω(2n)) before achieving any non-trivial error
  • Recent work of Blanc, Lange, and Tan [BLT20b] gives a top-down algorithm for learning decision trees that achieves provable guarantees for all target functions f .
  • Fact 3.1 motivates the following question: given a partial decision tree T ◦ and a target function f , is there a leaf l⋆
  • The authors will show that Theorem 1 follows as a consequence of the following more general result, which gives a performance guarantee for BuildStabilizingDT that scales according to the noise sensitivity of the target function f .
  • T}, the authors write Tj◦ to denote the size-j partial decision tree that BuildStabilizingDT constructs after j − 1 iterations.
  • The authors will use the potential function, the noise sensitivity of f with respect to a partial decision tree T ◦ (Definition 4), to bound this number.
  • The authors have designed a new top-down decision tree learning algorithm that achieves provable guarantees for all target functions f : {±1}n → {±1} with respect to the uniform distribution.
结论
  • The analysis of the algorithm draws on Fourier-analytic techniques, and is based on a view of the top-down decision tree construction process as a noise stabilization procedure.
  • The focus of the work has been on achieving a universal guarantee: the new algorithm constructs, for all target functions f and size parameters s, a decision tree hypothesis of quasipoly(s) size that achieves error close to that of the best size-s decision tree f .
  • Are there broad and natural subclasses of target functions for which the quasipoly(s) bound on the size of the decision tree hypothesis can be improved to poly(s), or even O(s)?
基金
  • Guy, Jane, and Li-Yang were supported by NSF award CCF-192179 and NSF CAREER award CCF-1942123
  • Neha was supported by NSF award 1704417
引用论文
  • [BDM19a] Alon Brutzkus, Amit Daniely, and Eran Malach. ID3 Learns Juntas for Smoothed Product Distributions. ArXiv, abs/1906.08654, 2019. 1
    Findings
  • [BDM19b] Alon Brutzkus, Amit Daniely, and Eran Malach. On the Optimality of Trees Generated by ID3. ArXiv, abs/1907.05444, 2019. 1
    Findings
  • [BEK02] Nader H Bshouty, Nadav Eiron, and Eyal Kushilevitz. Pac learning with nasty noise. Theoretical Computer Science, 288(2):255–275, 2002. 2
    Google ScholarLocate open access versionFindings
  • Guy Blanc, Jane Lange, and Li-Yang Tan. Provable guarantees for decision tree induction: the agnostic setting. In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020. Available at https://arxiv.org/abs/2006.00743.1
    Findings
  • Guy Blanc, Jane Lange, and Li-Yang Tan. Top-down induction of decision trees: rigorous guarantees and inherent limitations. In Proceedings of the 11th Innovations in Theoretical Computer Science Conference (ITCS), volume 151, pages 1–44, 2020. 1, 1, 1.3
    Google ScholarLocate open access versionFindings
  • Avrim Blum. Rank-r decision trees are a subclass of r-decision lists. Information Processing Letters, 42(4):183–185, 1992. 1.3
    Google ScholarLocate open access versionFindings
  • [Bre17] Leo Breiman. Classification and regression trees. Routledge, 2011
    Google ScholarFindings
  • Sitan Chen and Ankur Moitra. Beyond the low-degree algorithm: mixtures of subcubes and their applications. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC), pages 869–880, 2019. 1.3
    Google ScholarLocate open access versionFindings
  • [DKM96] Tom Dietterich, Michael Kearns, and Yishay Mansour. Applying the weak learning framework to understand and improve C4.5. In Proceedings of the 13th International Conference on Machine Learning (ICML), pages 96–104, 1996. 1
    Google ScholarLocate open access versionFindings
  • Andrzej Ehrenfeucht and David Haussler. Learning decision trees from random examples. Information and Computation, 82(3):231–246, 1989. 1.3
    Google ScholarLocate open access versionFindings
  • Bradley Efron and Charles Stein. The jackknife estimate of variance. The Annals of Statistics, pages 586–596, 1981. 4
    Google ScholarLocate open access versionFindings
  • Amos Fiat and Dmitry Pechyony. Decision trees: More theoretical justification for practical algorithms. In Proceedings of the 15th International Conference on Algorithmic Learning Theory (ALT), pages 156–170, 2004. 1
    Google ScholarLocate open access versionFindings
  • [GKK08] Parikshit Gopalan, Adam Kalai, and Adam Klivans. Agnostically learning decision trees. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC), pages 527–536, 2008. 1.3
    Google ScholarLocate open access versionFindings
  • [HKY18] Elad Hazan, Adam Klivans, and Yang Yuan. Hyperparameter optimization: A spectral approach. Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018. 1.3
    Google ScholarLocate open access versionFindings
  • Chris Jones. A noisy-influence regularity lemma for boolean functions. arXiv preprint, 2016. Available at https://arxiv.org/abs/1610.06950.3, 3
    Findings
  • Michael Kearns. Boosting theory towards practice: recent developments in decision tree induction and the weak learning framework (invited talk). In Proceedings of the 13th National Conference on Artificial intelligence (AAAI), pages 1337–1339, 1996. 1, 1.3
    Google ScholarLocate open access versionFindings
  • Jeff Kahn, Gil Kalai, and Nathan Linial. The influence of variables on boolean functions. In Proceedings of the 29th Annual Symposium on Foundations of Computer Science (FOCS), pages 68–80, 1988. 4
    Google ScholarLocate open access versionFindings
  • [KKMS08] Adam Kalai, Adam Klivans, Yishay Mansour, and Rocco A. Servedio. Agnostically learning halfspaces. SIAM Journal on Computing, 37(6):1777–1805, 2008. 3
    Google ScholarLocate open access versionFindings
  • Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807–837, 1993. 2
    Google ScholarLocate open access versionFindings
  • [KM93] Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the fourier spectrum. SIAM Journal on Computing, 22(6):1331–1348, December 1993. 1.3
    Google ScholarLocate open access versionFindings
  • Michael Kearns and Yishay Mansour. On the boosting ability of top-down decision tree learning algorithms. Journal of Computer and System Sciences, 58(1):109–128, 1999. 1, 1, 1.3
    Google ScholarLocate open access versionFindings
  • [KOS04] Adam Klivans, Ryan O’Donnell, and Rocco Servedio. Learning intersections and thresholds of halfspaces. Journal of Computer and System Sciences, 68(4):808–840, 2004. 3
    Google ScholarLocate open access versionFindings
  • [O’D03] Ryan O’Donnell. Computational applications of noise sensitivity. PhD thesis, Massachusetts Institute of Technology, 2003. 3
    Google ScholarFindings
  • [O’D14] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014. Available at http://analysisofbooleanfunctions.net/.3, 1.2, 3, 5.1
    Findings
  • Ryan O’Donnell and Rocco Servedio. Learning monotone decision trees in polynomial time. SIAM Journal on Computing, 37(3):827–844, 2007. 1.3, 5.1
    Google ScholarLocate open access versionFindings
  • [OSSS05] Ryan O’Donnell, Michael Saks, Oded Schramm, and Rocco Servedio. Every decision tree has an influential variable. In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 31–39, 2005. 4, 2, 4
    Google ScholarLocate open access versionFindings
  • [OSTW10] Ryan O’Donnell, Rocco Servedio, Li-Yang Tan, and Andrew Wan. A regularity lemma for low noisy-influences, 2010. 3
    Google ScholarFindings
  • Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986. 1
    Google ScholarLocate open access versionFindings
  • Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993. 1
    Google ScholarFindings
  • Michael Steele. An efron–stein inequality for nonsymmetric statistics. The Annals of Statistics, 14(2):753–758, 1986. 4
    Google ScholarLocate open access versionFindings
  • Leslie Valiant. Learning disjunction of conjunctions. In Proceedings of the 9th International Joint Conference on Artificial Intelligence (IJCAI), pages 560–566, 1985. 2
    Google ScholarLocate open access versionFindings
作者
Guy Blanc
Guy Blanc
Neha Gupta
Neha Gupta
Jane Lange
Jane Lange
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科