Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
CoRR(2024)
摘要
Large Vision-Language Models (LVLMs) are increasingly adept at generating
contextually detailed and coherent responses from visual inputs. However, their
application in multimodal decision-making and open-ended generation is hindered
by a notable rate of hallucinations, where generated text inaccurately
represents the visual contents. To address this issue, this paper introduces
the Instruction Contrastive Decoding (ICD) method, a novel approach designed to
reduce hallucinations during LVLM inference. Our method is inspired by our
observation that what we call disturbance instructions significantly exacerbate
hallucinations in multimodal fusion modules. ICD contrasts distributions from
standard and instruction disturbance, thereby increasing alignment uncertainty
and effectively subtracting hallucinated concepts from the original
distribution. Through comprehensive experiments on discriminative benchmarks
(POPE and MME) and a generative benchmark (LLaVa-Bench), we demonstrate that
ICD significantly mitigates both object-level and attribute-level
hallucinations. Moreover, our method not only addresses hallucinations but also
significantly enhances the general perception and recognition capabilities of
LVLMs.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要