Predicting Bug-Fixing Time: DistilBERT Versus Google BERT

Product-Focused Software Process Improvement(2022)

引用 1|浏览1
暂无评分
摘要
The problem of bug-fixing time can be treated as a supervised text categorization task in Natural Language Processing. In recent years, following the use of deep learning also in the field of Natural Language Processing, pre-trained contextualized representations of words have become widespread. One of the most used pre-trained language representations models is named Google BERT (hereinafter, for brevity, BERT). BERT uses a self-attention mechanism that allows learning the bidirectional context representation of a word in a sentence, which constitutes one of the main advantages over the previously proposed solutions. However, due to the large size of BERT, it is difficult for it to put it into production. To address this issue, a smaller, faster, cheaper and lighter version of BERT, named DistilBERT, has been introduced at the end of 2019. This paper compares the efficacy of BERT and DistilBERT, combined with the Logistic Regression, in predicting bug-fixing time from bug reports of a large-scale open-source software project, LiveCode. In the experimentation carried out, DistilBERT retains almost 100% of its language understanding capabilities and, in the best case, it is 63.28% faster than BERT. Moreover, with a not time-consuming tuning of the C parameter in Logistic Regression, the DistilBERT provides an accuracy value even better than BERT.
更多
查看译文
关键词
Google BERT, DistilBERT, Bug-fixing, Deep learning, Software maintenance process, Defect tracking systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要