CCLearner: A Deep Learning-Based Clone Detection Approach

2017 IEEE International Conference on Software Maintenance and Evolution (ICSME)(2017)

引用 194|浏览64
暂无评分
摘要
Programmers produce code clones when developing software. By copying and pasting code with or without modification, developers reuse existing code to improve programming productivity. However, code clones present challenges to software maintenance: they may require consistent application of the same or similar bug fixes or program changes to multiple code locations. To simplify the maintenance process, various tools have been proposed to automatically detect clones [1], [2], [3], [4], [5], [6]. Some tools tokenize source code, and then compare the sequence or frequency of tokens to reveal clones [1], [3], [4], [5]. Some other tools detect clones using tree-matching algorithms to compare the Abstract Syntax Trees (ASTs) of source code [2], [6]. In this paper, we present CCLEARNER, the first solely token-based clone detection approach leveraging deep learning. CCLEARNER extracts tokens from known method-level code clones and nonclones to train a classifier, and then uses the classifier to detect clones in a given codebase. To evaluate CCLEARNER, we reused BigCloneBench [7], an existing large benchmark of real clones. We used part of the benchmark for training and the other part for testing, and observed that CCLEARNER effectively detected clones. With the same data set, we conducted the first systematic comparison experiment between CCLEARNER and three popular clone detection tools. Compared with the approaches not using deep learning, CCLEARNER achieved competitive clone detection effectiveness with low time cost.
更多
查看译文
关键词
deep learning,clone detection,empirical
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要