V-STaR: Training Verifiers for Self-Taught Reasoners
CoRR(2024)
摘要
Common self-improvement approaches for large language models (LLMs), such as
STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated
solutions to improve their problem-solving ability. However, these approaches
discard the large amounts of incorrect solutions generated during this process,
potentially neglecting valuable information in such solutions. To address this
shortcoming, we propose V-STaR that utilizes both the correct and incorrect
solutions generated during the self-improvement process to train a verifier
using DPO that judges correctness of model-generated solutions. This verifier
is used at inference time to select one solution among many candidate
solutions. Running V-STaR for multiple iterations results in progressively
better reasoners and verifiers, delivering a 4
improvement over existing self-improvement and verification approaches on
common code generation and math reasoning benchmarks with LLaMA2 models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要