Generalizing Conversational Dense Retrieval via LLM-Cognition Data Augmentation
CoRR(2024)
摘要
Conversational search utilizes muli-turn natural language contexts to
retrieve relevant passages. Existing conversational dense retrieval models
mostly view a conversation as a fixed sequence of questions and responses,
overlooking the severe data sparsity problem – that is, users can perform a
conversation in various ways, and these alternate conversations are unrecorded.
Consequently, they often struggle to generalize to diverse conversations in
real-world scenarios. In this work, we propose a framework for generalizing
Conversational dense retrieval via LLM-cognition data Augmentation (ConvAug).
ConvAug first generates multi-level augmented conversations to capture the
diverse nature of conversational contexts. Inspired by human cognition, we
devise a cognition-aware process to mitigate the generation of false positives,
false negatives, and hallucinations. Moreover, we develop a difficulty-adaptive
sample filter that selects challenging samples for complex conversations,
thereby giving the model a larger learning space. A contrastive learning
objective is then employed to train a better conversational context encoder.
Extensive experiments conducted on four public datasets, under both normal and
zero-shot settings, demonstrate the effectiveness, generalizability, and
applicability of ConvAug.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要