CRAFT: Extracting and Tuning Cultural Instructions from the Wild
arxiv(2024)
摘要
Large language models (LLMs) have rapidly evolved as the foundation of
various natural language processing (NLP) applications. Despite their wide use
cases, their understanding of culturally-related concepts and reasoning remains
limited. Meantime, there is a significant need to enhance these models'
cultural reasoning capabilities, especially concerning underrepresented
regions. This paper introduces a novel pipeline for extracting high-quality,
culturally-related instruction tuning datasets from vast unstructured corpora.
We utilize a self-instruction generation pipeline to identify cultural concepts
and trigger instruction. By integrating with a general-purpose instruction
tuning dataset, our model demonstrates enhanced capabilities in recognizing and
understanding regional cultural nuances, thereby enhancing its reasoning
capabilities. We conduct experiments across three regions: Singapore, the
Philippines, and the United States, achieving performance improvement of up to
6
sets directly from unstructured data, setting a precedent for future
innovations in the field.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要