Software Vulnerability and Functionality Assessment using LLMs
CoRR(2024)
摘要
While code review is central to the software development process, it can be
tedious and expensive to carry out. In this paper, we investigate whether and
how Large Language Models (LLMs) can aid with code reviews. Our investigation
focuses on two tasks that we argue are fundamental to good reviews: (i)
flagging code with security vulnerabilities and (ii) performing software
functionality validation, i.e., ensuring that code meets its intended
functionality. To test performance on both tasks, we use zero-shot and
chain-of-thought prompting to obtain final “approve or reject”
recommendations. As data, we employ seminal code generation datasets (HumanEval
and MBPP) along with expert-written code snippets with security vulnerabilities
from the Common Weakness Enumeration (CWE). Our experiments consider a mixture
of three proprietary models from OpenAI and smaller open-source LLMs. We find
that the former outperforms the latter by a large margin. Motivated by
promising results, we finally ask our models to provide detailed descriptions
of security vulnerabilities. Results show that 36.7
descriptions can be associated with true CWE vulnerabilities.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要