Simplifying Software Defect Prediction (via the "early bird" Heuristic)

N. C. Shrikanth,Tim Menzies

arXiv (Cornell University)（2021）

引用 0|浏览0

暂无评分

摘要

Before researchers rush to reason across all available data or try complex methods, perhaps it is prudent to first check for simpler alternatives. Specifically, if the historical data has the most information in some small region, then perhaps a model learned from that region would suffice for the rest of the project. To support this claim, we offer a case study with 240 GitHub projects, where we find that the information in those projects "clumped" towards the earliest parts of the project. A defect prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives. Using just this early life cycle data, we can build models very quickly, very early in the software project life cycle. Moreover, using this method, we have shown that a simple model (with just two features) generalizes to hundreds of software projects. Based on this experience, we doubt that prior work on generalizing software engineering defect prediction models may have needlessly complicated an inherently simple process. Further, prior work that focused on later-life cycle data needs to be revisited since their conclusions were drawn from relatively uninformative regions. Replication note: all our data and scripts are online at https://github.com/snaraya7/simplifying-software-analytics

查看译文

关键词

software defect prediction,heuristic

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要