A systematic evaluation of large language models for generating programming code
CoRR(2024)
摘要
We systematically evaluated the performance of seven large language models in
generating programming code using various prompt strategies, programming
languages, and task difficulties. GPT-4 substantially outperforms other large
language models, including Gemini Ultra and Claude 2. The coding performance of
GPT-4 varies considerably with different prompt strategies. In most LeetCode
and GeeksforGeeks coding contests evaluated in this study, GPT-4 employing the
optimal prompt strategy outperforms 85 percent of human participants.
Additionally, GPT-4 demonstrates strong capabilities in translating code
between different programming languages and in learning from past errors. The
computational efficiency of the code generated by GPT-4 is comparable to that
of human programmers. These results suggest that GPT-4 has the potential to
serve as a reliable assistant in programming code generation and software
development.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要