Interpretable Software Defect Prediction Incorporating Multiple Rules
IEEE International Conference on Software Analysis, Evolution, and Reengineering(2023)
Abstract
Software defect prediction models are of great importance in software testing, however, they also face the problem of model uninterpretability. Association rules have good accuracy and interpretability, being widely used in interpretable rule mining scenarios, but there are some common problems with current research: 1) Data unbalance seriously affects the accuracy of mined rules; 2) Most studies treat features as equally important and ignore feature contribution degree; 3) Classification by default rules easily reduces the accuracy of defect classification. Therefore, in the class unbalance scenario, we propose a weighted association rule based on the contribution degree of features, which solves the problem that defective rules are difficult to mine and considers the contribution degree of features. The process of rule generation, ranking, pruning and prediction is optimized according to the weighted support of the rules, and an ensemble model incorporating multiple rules is built. Experimental results on the PROMISE dataset show that the model proposed in this paper obtains an average F1 and MCC improvement of 6.4 % and 9.8 %, respectively, compared with current state-of-the-art classifiers; in terms of interpretability, rule-based interpretation in this paper can provide developers with better guidance on defect repair and risk avoidance compared with model-agnostic methods. From the experimental results, it can be concluded that the contribution degree of features helps to improve the quality of the rule set, and the construction of diversified rules can improve the accuracy of rule prediction.
MoreTranslated text
Key words
Defect prediction,Association Rules,Feature Contribution Degree,Weighted Ensemble Model,Rule Interpretation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined