Leveraging and Evaluating Automatic Code Summarization for JPA Program Comprehension

Richard Mayer,Michael Moser,Verena Geist

2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER（2023）

引用 0|浏览7

暂无评分

摘要

Accurate and up-to-date software documentation is an important factor in the maintenance and evolution of software systems. Especially with legacy software, documentation is often outdated or missing entirely and manual redocumentation is not feasible. In recent years, automatic code summaries based on artificial neural network (ANN) models have been proposed to address this problem, and metric-based evaluations suggest promising quality of the generated summaries. To evaluate the applicability of state-of-the-art code summarization in an industry context, we conduct an expert evaluation to assess the quality of the generated summaries for JPA program comprehension. We then compare the level of quality perceived by human experts for both predicted and reference summaries and discuss how these results are influenced by industry-specific requirements and how they correlate with automatically computed source code summary metrics. The results show that the quality of predicted summaries is predominantly (about 80%) poor in terms of accuracy and completeness. Moreover, the results support the generally increasing consensus that the widely used BLEU or ROUGE-L score is not a suitable means of evaluating the quality of code summarization. While these metrics are an adequate means of comparison with existing related work, they cannot reflect the human-perceived level of quality in practice.

查看译文

关键词

Software Maintenance,Automatic Code Summarization,Program Comprehension,Evaluation Metrics

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要