Chrome Extension
WeChat Mini Program
Use on ChatGLM

CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models

arXiv (Cornell University)(2024)

Cited 0|Views19
No score
Abstract
Multimodal foundation models are prone to hallucination, generating outputsthat either contradict the input or are not grounded by factual information.Given the diversity in architectures, training data and instruction tuningtechniques, there can be large variations in systems' susceptibility tohallucinations. To assess system hallucination robustness, hallucinationranking approaches have been developed for specific tasks such as imagecaptioning, question answering, summarization, or biography generation.However, these approaches typically compare model outputs to gold-standardreferences or labels, limiting hallucination benchmarking for new domains. Thiswork proposes "CrossCheckGPT", a reference-free universal hallucination rankingfor multimodal foundation models. The core idea of CrossCheckGPT is that thesame hallucinated content is unlikely to be generated by different independentsystems, hence cross-system consistency can provide meaningful and accuratehallucination assessment scores. CrossCheckGPT can be applied to any model ortask, provided that the information consistency between outputs can be measuredthrough an appropriate distance metric. Focusing on multimodal large languagemodels that generate text, we explore two information consistency measures:CrossCheck-explicit and CrossCheck-implicit. We showcase the applicability ofour method for hallucination ranking across various modalities, namely thetext, image, and audio-visual domains. Further, we propose the firstaudio-visual hallucination benchmark, "AVHalluBench", and illustrate theeffectiveness of CrossCheckGPT, achieving correlations of 98human judgements on MHaluBench and AVHalluBench, respectively.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined