SCDetector: software functional clone detection based on semantic tokens analysis

ASE(2020)

引用 24|浏览148
暂无评分
摘要
ABSTRACTCode clone detection is to find out code fragments with similar functionalities, which has been more and more important in software engineering. Many approaches have been proposed to detect code clones, in which token-based methods are the most scalable but cannot handle semantic clones because of the lack of consideration of program semantics. To address the issue, researchers conduct program analysis to distill the program semantics into a graph representation and detect clones by matching the graphs. However, such approaches suffer from low scalability since graph matching is typically time-consuming. In this paper, we propose SCDetector to combine the scalability of token-based methods with the accuracy of graph-based methods for software functional clone detection. Given a function source code, we first extract the control flow graph by static analysis. Instead of using traditional heavyweight graph matching, we treat the graph as a social network and apply social-network-centrality analysis to dig out the centrality of each basic block. Then we assign the centrality to each token in a basic block and sum the centrality of the same token in different basic blocks. By this, a graph is turned into certain tokens with graph details (i.e., centrality), called semantic tokens. Finally, these semantic tokens are fed into a Siamese architecture neural network to train a code clone detector. We evaluate SCDetector on two large datasets of functionally similar code. Experimental results indicate that our system is superior to four state-of-the-art methods (i.e., SourcererCC, Deckard, RtvNN, and ASTNN) and the time cost of SCDetector is 14 times less than a traditional graph-based method (i.e., CCSharp) on detecting semantic clones.
更多
查看译文
关键词
Social Network Centrality, Semantic Tokens, Siamese Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要