Python Open-Source Code Traceability Model Based on Graph Neural Networks.

Ruizhi Wang,Yanping Xu,Yifan Wu

2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)(2023)

引用 0|浏览0
暂无评分
摘要
When programmers write project code, they may copy or reference some open-source code, which may include defective code, causing vulnerabilities in the project. This causes a potential threat to the project and threatens the security of the software supply chain. Therefore, to protect the code security, the Python open-source code traceability model based on graph neural networks is proposed to calculate the similarity between the programmers' Python code and the Python open-source code. Firstly, each function in Python code is parsed into one Type Abstract Syntax Tree. Secondly, graph neural networks are used to calculate the function similarity between the two Type Abstract Syntax Trees of the original code and open-source code. Thirdly, the overall similarity of a Python project that consists of many functions is calculated based on the function similarity following the maximum retention principle. The experiment was conducted on three datasets: StudentWork, GitDown, and Obfuscated-GitDown. The experiment shows that the results calculated by our model are more reasonable, which places more emphasis on similarity in code structure than on code text. Taking the Pyobfuscate obfuscation scenario as an example, our model considering code structure gets similarity 18.12%~39.54% higher than other methods that calculate similarity based on the code text.
更多
查看译文
关键词
Graph neural network,Code similarity calculation,Type Abstract Syntax Tree,Python open-source Code,Code traceability,Software supply chain security
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要