Security Code Review by LLMs: A Deep Dive into Responses
CoRR(2024)
摘要
Security code review aims to combine automated tools and manual efforts to
detect security defects during development. The rapid development of Large
Language Models (LLMs) has shown promising potential in software development,
as well as opening up new possibilities in automated security code review. To
explore the challenges of applying LLMs in practical code review for security
defect detection, this study compared the detection performance of three
state-of-the-art LLMs (Gemini Pro, GPT-4, and GPT-3.5) under five prompts on
549 code files that contain security defects from real-world code reviews.
Through analyzing 82 responses generated by the best-performing LLM-prompt
combination based on 100 randomly selected code files, we extracted and
categorized quality problems present in these responses into 5 themes and 16
categories. Our results indicate that the responses produced by LLMs often
suffer from verbosity, vagueness, and incompleteness, highlighting the
necessity to enhance their conciseness, understandability, and compliance to
security defect detection. This work reveals the deficiencies of LLM-generated
responses in security code review and paves the way for future optimization of
LLMs towards this task.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要