Defect Prediction between Software Versions with Active Learning and Dimensionality Reduction

ISSRE（2014）

引用 86|浏览33

暂无评分

摘要

Accurate detection of defects prior to product release helps software engineers focus verification activities on defect prone modules, thus improving the effectiveness of software development. A common scenario is to use the defects from prior releases to build the prediction model for the upcoming release, typically through a supervised learning method. As software development is a dynamic process, fault characteristics in subsequent releases may vary. Therefore, supplementing the defect information from prior releases with limited information about the defects from the current release detected early seems to offer intuitive and practical benefits. We propose active learning as a way to automate the development of models which improve the performance of defect prediction between successive releases. Our results show that the integration of active learning with uncertainty sampling consistently outperforms the corresponding supervised learning approach. We further improve the prediction performance with feature compression techniques, where feature selection or dimensionality reduction is applied to defect data prior to active learning. We observe that dimensionality reduction techniques, particularly multidimensional scaling with random forest similarity, work better than feature selection due to their ability to identify and combine essential information in data set features. We present the improvements offered by this methodology through the prediction of defective modules in the three successive versions of Eclipse.

查看译文

关键词

random forest similarity,defects detection,software defect prediction, complexity measures, active learning, dimensionality reduction, machine learning,defective modules,learning (artificial intelligence),eclipse,defect prone modules,defect information,uncertainty sampling,dimensionality reduction,defect prediction,software development,complexity measures,multidimensional scaling,data set features,fault characteristics,active learning,uncertainty handling,software fault tolerance,supervised learning method,software defect prediction,feature selection,machine learning,configuration management,program verification,feature compression techniques,prediction model,software versions,verification activities

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要