Optimizing Language Model's Reasoning Abilities with Weak Supervision
arxiv(2024)
摘要
While Large Language Models (LLMs) have demonstrated proficiency in handling
complex queries, much of the past work has depended on extensively annotated
datasets by human experts. However, this reliance on fully-supervised
annotations poses scalability challenges, particularly as models and data
requirements grow. To mitigate this, we explore the potential of enhancing
LLMs' reasoning abilities with minimal human supervision. In this work, we
introduce self-reinforcement, which begins with Supervised Fine-Tuning (SFT) of
the model using a small collection of annotated questions. Then it iteratively
improves LLMs by learning from the differences in responses from the SFT and
unfinetuned models on unlabeled questions. Our approach provides an efficient
approach without relying heavily on extensive human-annotated explanations.
However, current reasoning benchmarks typically only include golden-reference
answers or rationales. Therefore, we present PuzzleBen, a weakly
supervised benchmark that comprises 25,147 complex questions, answers, and
human-generated rationales across various domains, such as brainteasers,
puzzles, riddles, parajumbles, and critical reasoning tasks. A unique aspect of
our dataset is the inclusion of 10,000 unannotated questions, enabling us to
explore utilizing fewer supersized data to boost LLMs' inference capabilities.
Our experiments underscore the significance of PuzzleBen, as well as
the effectiveness of our methodology as a promising direction in future
endeavors. Our dataset and code will be published soon on .
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要