CPGBERT: An Effective Model for Defect Detection by Learning Program Semantics via Code Property Graph.

Jingqiang Liu, Xiaoxi Zhu,Chaoge Liu,Xiang Cui,Qixu Liu

TrustCom(2022)

引用 0|浏览34
暂无评分
摘要
With the increasing complexity of software composition, code defects have become a long-term problem in software security. Traditional static analysis techniques cannot exhaustively enumerate all unsafe modes, and problems such as low path coverage rate brought by dynamic detection techniques make software security vulnerability detection inefficient. Methods based on Natural Language Processing have promoted the research of code defect detection tasks; however, there are problems of insufficient code semantic learning and limited data processing by pre-trained models. To solve these problems, from the perspective of enriching model input semantics and improving the model’s ability to process data, based on the Transformer model, we propose a hierarchical compression encoder model CPGBERT to detect whether the target function has defects. By using the regularity of the program context and structure, the program code is sliced for the input-output variables related to the objective function and dependencies on the codes’ propagation paths. Extract multiple code property graph information on rich semantics from the sliced program code for graph fusion, and embed the fused code property graph into the model by grouping. During the learning process, the independent hidden layer features are compressed and aggregated to make the model focus on the deep semantic learning of the objective function. The experiment uses the CodeXGLUE benchmark dataset and compares 6 kinds of code defect detection models having better performance to perform defect detection and effect evaluation on actual engineering code. The results show that the accuracy of the CPGBERT detection model is 67.97%, which is 5.89% higher than the CodeBERT model proposed by Microsoft and 1.35% higher than the state-of-the-art model CoTexT.
更多
查看译文
关键词
Software Security,Code Analysis,Defect Detection,Code Property Graph,Natural Language Processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要