Theoretical Comparison between the Gini Index and Information Gain Criteria

Ann. Math. Artif. Intell., no. 1 (2004): 77-93

引用89|浏览34
EI
下载 PDF 全文
引用
微博一下

摘要

The spatial world consists of regions and relationships between regions. Examples of such relationships are that two regions are disjoint or that one is a proper part of the other. The formal specification of spatial relations is an important part of ...

代码

数据

简介
  • Work in the field of decision tree construction focused mainly on the definition and on the realization of classification systems.
  • Once a certain number of algorithms were defined, a lot of research was dedicated to compare them
  • This is a relatively hard task as the different systems evolved from different backgrounds: information theory, discriminant analysis, encoding techniques etc.
重点内容
  • Early work in the field of decision tree construction focused mainly on the definition and on the realization of classification systems
  • The sign of the differences of the Gini Index functions corresponding to two tests Ì , Ì 1⁄4 and of the Information Gain functions are established for the six possible situations
  • We presented a formal comparison of the behavior of two of the most popular split functions, namely the Gini Index function and the Information Gain function
  • The situations where the two split functions agree/disagree on the selected split were mathematically characterized. Based on these characterizations we were able to analyze the frequency of agreement/disagremment of the Gini Index function and the Information Gain function
结果
  • Index and Information Gain criteria.
  • The sign of the differences of the Gini Index functions corresponding to two tests Ì , Ì 1⁄4 and of the Information Gain functions are established for the six possible situations.
  • The authors will present in the following the details for one case as an illustration.
  • If the sign of theμ ́ μ difference of the Gini Index functions Ò Ì.
结论
  • Conclusions and Future

    Work

    In this paper, the authors presented a formal comparison of the behavior of two of the most popular split functions, namely the Gini Index function and the Information Gain function.
  • The situations where the two split functions agree/disagree on the selected split were mathematically characterized
  • Based on these characterizations the authors were able to analyze the frequency of agreement/disagremment of the Gini Index function and the Information Gain function.
  • The authors would like to emphasize that the methodology introduced in this paper is not limited to the two analyzed split criteria
  • The authors used it successfully to formalize and compare other split criteria.
  • Preliminary results can be found in [17]
基金
  • £This work was supported by grant number 2100-056986.99 from the Swiss National Science Foundation
引用论文
  • A. Babic, E. Krusinska, and J. E. Stromberg. Extraction of diagnostic rules using recursive partitioning systems: A comparison of two approches. Artificial Intelligence in Medicine, 20(5):373–387, October 1992.
    Google ScholarLocate open access versionFindings
  • L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and regression trees. Wadsworth International Group, 1984.
    Google ScholarFindings
  • Lopez de Mantaras. A distance-based attribute selection measure for decision tree induction. Machine Learning, 6(1):81– 92, 1991.
    Google ScholarLocate open access versionFindings
  • J. Gama and P. Brazdil. Characterization of classification algorithms. In C. Pinto-Ferreira and N. Mamede, editors, EPIA-95: Progress in Artificial Intelligence, 7th Portuguese Conference on Artificial Intelligence, pages 189–200. Springer Verlag, 1995.
    Google ScholarLocate open access versionFindings
  • Igor Kononenko. On biases in estimating multi-valued attributes. In Chris Mellish, editor, IJCAI-95: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1034–1040, Montreal, Canada, August 199Morgan Kaufmann Publishers Inc, San Mateo, CA.
    Google ScholarLocate open access versionFindings
  • Tjen-Sien Lim, Wey-Yin Loh, and Yu-Shan Shih. A comparison of prediction accuracy, complexity and training time of thirty-three old and new classification algorithms. Machine Learning, 1999.
    Google ScholarLocate open access versionFindings
  • John Mingers. An empirical comparison of selection measures for decision tree induction. Machine Learning, 3:319–342, 1989.
    Google ScholarLocate open access versionFindings
  • Masashiro Miyakawa. Criteria for selecting a variable in the construction of efficient decision trees. IEEE Transactions on Computers, 35(1):133–141, January 1929.
    Google ScholarLocate open access versionFindings
  • B. M. Moret. Decision trees and diagrams. Computing Surveys, 14(4):593–623, 1982.
    Google ScholarLocate open access versionFindings
  • Kolluru Venkata Sreerama Murthy. On Growing Better Decision Trees from Data. PhD thesis, The John Hopkins University, Baltimore, Maryland, 1995.
    Google ScholarFindings
  • G. Pagallo. Adaptive Decision Tree Algorithms for Learning from Examples. PhD thesis, University of California, Santa Cruz, 1990.
    Google ScholarFindings
  • J. R. Quinlan. C4.5 Programs for machine learning. Morgan Kaufmann Publishers, 1993.
    Google ScholarFindings
  • John Ross Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies, (27):221–234, 1987.
    Google ScholarLocate open access versionFindings
  • Laura E. Raileanu. Theoretical comparison between the gini index and information gain functions. Technical report, Facultede droit et Sciences Economiques, Universitede Neuchatel, 2000.
    Google ScholarFindings
  • S. R. Safavin and D. Langrebe. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man and Cybernetics, 21(3):660–674, 1991.
    Google ScholarLocate open access versionFindings
  • M. Sahami. Learning non-linearly separable boolean functions with linear threshold unit trees and madaline-style networks. In AAAI Press, editor, Proceedings of the Eleventh National Conference on Artificial Inteligence, pages 335–341, 1993.
    Google ScholarLocate open access versionFindings
  • Kilian Stoffel and Laura E. Raileanu. Selecting optimal split-functions for large datasets. In Research and Development in Intelligent Systems XVII, BCS Conference Series, 2000.
    Google ScholarLocate open access versionFindings
  • Ricardo Vilalta and Daniel Oblinger. A quantification of distance-bias between evaluation metrics in classification. In Proceedings of the 17th International Conference on Machine Learning. Stanford University, 2000.
    Google ScholarLocate open access versionFindings
  • Allan P. White and Wei Zhang Liu. Bias il information-based measures in decision tree induction. Machine Learning, 15(3):321–328, June 1997.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn