CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation
arxiv(2024)
摘要
Utilizing large language models to generate codes has shown promising meaning
in software development revolution. Despite the intelligence shown by the
general large language models, their specificity in code generation can still
be improved due to the syntactic gap and mismatched vocabulary existing among
natural language and different programming languages. In addition, programming
languages are inherently logical and complex, making them hard to be correctly
generated. Existing methods rely on multiple prompts to the large language
model to explore better solutions, which is expensive. In this paper, we
propose Syntax Graph Retrieval Augmented Code Generation (CodeGRAG) to enhance
the performance of LLMs in single-round code generation tasks. CodeGRAG
extracts and summarizes the control flow and data flow of code blocks to fill
the gap between programming languages and natural language. The extracted
external structural knowledge models the inherent flows of code blocks, which
can facilitate LLMs for better understanding of code syntax and serve as a
bridge among different programming languages. CodeGRAG significantly improves
the code generation ability of LLMs and can even offer performance gain for
cross-lingual code generation, e.g., C++ for Python.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要