Dual Instruction Tuning with Large Language Models for Mathematical Reasoning
CoRR(2024)
摘要
Recent advancements highlight the success of instruction tuning with large
language models (LLMs) utilizing Chain-of-Thought (CoT) data for mathematical
reasoning tasks. Despite the fine-tuned LLMs, challenges persist, such as
incorrect, missing, and redundant steps in CoT generation leading to
inaccuracies in answer predictions. To alleviate this problem, we propose a
dual instruction tuning strategy to meticulously model mathematical reasoning
from both forward and reverse directions. This involves introducing the
Intermediate Reasoning State Prediction task (forward reasoning) and the
Instruction Reconstruction task (reverse reasoning) to enhance the LLMs'
understanding and execution of instructions. Training instances for these tasks
are constructed based on existing mathematical instruction tuning datasets.
Subsequently, LLMs undergo multi-task fine-tuning using both existing
mathematical instructions and the newly created data. Comprehensive experiments
validate the effectiveness and domain generalization of the dual instruction
tuning strategy across various mathematical reasoning tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要