谷歌浏览器插件
订阅小程序
在清言上使用

DSSDPP: Data Selection and Sampling Based Domain Programming Predictor for Cross-Project Defect Prediction

IEEE Transactions on Software Engineering(2023)

引用 2|浏览26
暂无评分
摘要
Cross-project defect prediction (CPDP) refers to recognizing defective software modules in one project (i.e., target) using historical data collected from other projects (i.e., source), which can help developers find defects and prioritize their testing efforts. Unfortunately, there often exists large distribution difference between the source and target data. Most CPDP methods neglect to select the appropriate source data for a given target at the project level. More importantly, existing CPDP models are parametric methods, which usually require intensive parameter selection and tuning to achieve better prediction performance. This would hinder wide applicability of CPDP in practice. Moreover, most CPDP methods do not address the cross-project class imbalance problem. These limitations lead to suboptimal CPDP results. In this paper, we propose a novel data selection and sampling based domain programming predictor (DSSDPP) for CPDP, which addresses the above limitations. DSSDPP is a non-parametric CPDP method, which can perform knowledge transfer across projects without the need for parameter selection and tuning. By exploiting the structures of source and target data, DSSDPP can learn a discriminative transfer classifier for identifying defects of the target project. Extensive experiments on 22 projects from four datasets indicate that DSSDPP achieves better MCC and AUC results against a range of competing methods both in the single-source and multi-source scenarios. Since DSSDPP is easy, effective, extensible, and efficient, we suggest that future work can use it with the well-chosen source data to conduct CPDP especially for the projects with limited computational budget.
更多
查看译文
关键词
Data models,Prediction algorithms,Measurement,Predictive models,Tuning,Transfer learning,Programming,Cross-project defect prediction,domain programming predictor,data selection,data sampling,transfer learning,software quality assurance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要