MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
CoRR(2024)
摘要
Though reasoning abilities are considered language-agnostic, existing LLMs
exhibit inconsistent reasoning abilities across different languages, e.g.,
reasoning in the dominant language like English is superior to other languages
due to the imbalance of multilingual training data. To enhance reasoning
abilities in non-dominant languages, we propose a
Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to
align the reasoning processes in other languages with the dominant language.
Specifically, we harness an off-the-shelf translation model for the consistency
between answers in non-dominant and dominant languages, which we adopt as the
preference for optimization, e.g., Direct Preference Optimization (DPO) or
Proximal Policy Optimization (PPO). Experiments show that MAPO stably achieves
significant improvements in the multilingual reasoning of various models on all
three benchmarks (MSVAMP +16.2
improved reasoning consistency across languages.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要