Java Code Clone Detection by Exploiting Semantic and Syntax Information From Intermediate Code-Based Graph

IEEE TRANSACTIONS ON RELIABILITY(2023)

引用 2|浏览38
暂无评分
摘要
Code clone detection plays a critical role in the field of software engineering. To achieve this goal, developers are required to have rich development experience for finding the "functional" clone code. However, this is unfriendly to novice developers. Although many approaches were proposed to automatically detect code clones, the results are not satisfactory. A major reason is that it is difficult to extract syntax and semantic information from the source code. To resolve this problem, in this article, we develop a novel graph representation approach based on intermediate code to detect the functional code clones. This graph representation is built based on intermediate code compiled from the source code. By using it, we can easily utilize graph embedding techniques to extract syntactic and semantic features from abstract syntax tree, control flow graph, and DFG generated from intermediate code. After that, we use the Softmax classifier to detect functional code clone pairs. We evaluate the performance of the proposed graph representation approach based on intermediate code for the code clone detection task on the BigCloneBench dataset. In order to improve performance, the embedded representation of intermediate code is initialized based on pretrained vectors learned from the collected LLVM IR dataset in advance. The experimental results show that our proposed intermediate code-based graph approach performs better than existing functional code clone detection approaches. Especially for the type-4 code clone detection, our approach outperforms the baseline approaches by an average of 33.49% in the term of F1 score.
更多
查看译文
关键词
Codes, Cloning, Syntactics, Semantics, Data mining, Task analysis, Feature extraction, Abstract syntax tree (AST), clone code detection, control flow graph (CFG), data flow graph (DFG), graph embedding, intermediate code
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要