Impact of Optimal Feature Selection Using Hybrid Method for a Multiclass Problem in Cross Project Defect Prediction

APPLIED SCIENCES-BASEL(2022)

引用 2|浏览1
暂无评分
摘要
The objective of cross-project defect prediction (CPDP) is to develop a model that trains bugs on current source projects and predicts defects of target projects. Due to the complexity of projects, CPDP is a challenging task, and the precision estimated is not always trustworthy. Our goal is to predict the bugs in the new projects by training our model on the current projects for cross-projects to save time, cost, and effort. We used experimental research and the type of research is explanatory. Our research method is controlled experimentation, for which our independent variable is prediction accuracy and dependent variables are hyper-parameters which include learning rate, epochs, and dense layers of neural networks. Our research approach is quantitative as the dataset is quantitative. The design of our research is 1F1T (1 factor and 1 treatment). To obtain the results, we first carried out exploratory data analysis (EDA). Using EDA, we found that the dataset is multi-class. The dataset contains 11 different projects consisting of 28 different versions of all the projects in total. We also found that the dataset has significant issues of noise, class imbalance, and distribution gaps between different projects. We pre-processed the dataset for experimentation by resolving all these issues. To resolve the issue of noise, we removed duplication from the dataset by removing redundant rows. We then covered the data distribution gap between different sources and target projects using the min-max normalization technique. After covering the data distribution gap, we generated synthetic data using a CTGANsynthesizer to solve class imbalance issues. We solved the class imbalance issue by generating an equal number of instances, as well as an equal number of output classes. After carrying out all of these steps, we obtained normalized data. We applied the hybrid feature selection technique on the pre-processed data to optimize the feature set. We obtained significant results of an average AUC of 75.98%. From the empirical study, it was demonstrated that feature selection and hyper-parameter tuning have a significant impact on defect prediction accuracy in cross-projects.
更多
查看译文
关键词
hyper-parameter tuning,cross project defect prediction (CPDP),generative adversarial networks (GAN),hybrid feature selection (HFS)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要