An Empirical Study of Skew-Insensitive Splitting Criteria and Its Application in Traditional Chinese Medicine
Intelligent automation and soft computing/Intelligent automation & soft computing(2014)
Abstract
Learning from imbalanced datasets is a challenging topic and plays an important role in data mining community. Traditional splitting criteria such as information gain are sensitive to class distribution. In order to overcome the weakness, Hellinger Distance Decision Trees (HDDT) is proposed by Cieslak and Chawla. Despite HDDT outperforms the traditional decision trees, however, there may be other skew-insensitive splitting criteria. In this paper, we propose some new skew-insensitive splitting criteria which can be used in the construction of decision trees and applied a comprehensive empirical evaluation framework testing against commonly used sampling and ensemble methods, considering performance across 58 datasets. Based on the experimental results, we demonstrate the superiority of these skew-insensitive decision trees on the datasets with high imbalanced level and competitive performance on the datasets with low imbalanced level and K-L divergence-based decision tree (KLDDT) is the most robust among these skew-insensitive decision trees in the presence of class imbalance, especially when combined with SMOTE. Thus, we recommend the use of KLDDT with SMOTE when learning from high imbalanced datasets. Finally, we used these skew-insensitive decision trees to build the diagnosis model of chronic obstructive pulmonary disease in traditional Chinese Medicine. The results show that KLDDT is the most effective method.
MoreTranslated text
Key words
Imbalanced learning,Skew-insensitive splitting criteria,Decision trees,Chronic obstructive pulmonary disease
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined