Chrome Extension
WeChat Mini Program
Use on ChatGLM

Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5

Conference on Empirical Methods in Natural Language Processing (EMNLP)(2022)CCF B

Cited 6|Views59
Abstract
Automated software debugging is a crucial task for improving the productivity of software developers. Many neural-based techniques have been proven effective for debugging-related tasks such as bug localization and program repair (or bug fixing). However, these techniques often focus only on either one of them or approach them in a stage-wise manner, ignoring the mutual benefits between them. In this work, we propose a novel unified Detect-Localize-Repair framework based on a pretrained programming language model CodeT5 to seamlessly address these tasks, named CodeT5-DLR. Specifically, we propose three objectives to adapt the generic CodeT5 for debugging: a bug detection objective to determine whether a given code snippet is buggy or not, a bug localization objective to identify the buggy lines, and a program repair objective to translate the buggy code to its fixed version. We evaluate it on each of these tasks and their combined setting on two newly collected line-level debugging datasets in Java and Python. Extensive results show that our model significantly outperforms existing baselines from both NLP and software engineering domains.
More
Translated text
Key words
Bug Localization,Software Fault Localization,Fault Localization,Software Defect Prediction,Code Clone Detection
PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文提出了 Detect-Localize-Repair 框架,通过 CodeT5 预训练模型实现了软件调试中的检测、定位和修复的统一处理,提高了自动化软件调试的效率和质量。

方法】:作者通过设计三种训练目标,即缺陷检测、缺陷定位和程序修复,来优化 CodeT5 模型,使其能够适应调试任务的需求。

实验】:研究者在两个新收集的 Java 和 Python 代码级别的调试数据集上评估了 CodeT5-DLR 模型,结果显示该模型在各项任务上均显著优于现有的自然语言处理和软件工程领域的基线模型。