Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
CVPR 2024(2024)
摘要
Recent advancements in large-scale visual-language pre-trained models have
led to significant progress in zero-/few-shot anomaly detection within natural
image domains. However, the substantial domain divergence between natural and
medical images limits the effectiveness of these methodologies in medical
anomaly detection. This paper introduces a novel lightweight multi-level
adaptation and comparison framework to repurpose the CLIP model for medical
anomaly detection. Our approach integrates multiple residual adapters into the
pre-trained visual encoder, enabling a stepwise enhancement of visual features
across different levels. This multi-level adaptation is guided by multi-level,
pixel-wise visual-language feature alignment loss functions, which recalibrate
the model's focus from object semantics in natural imagery to anomaly
identification in medical images. The adapted features exhibit improved
generalization across various medical data types, even in zero-shot scenarios
where the model encounters unseen medical modalities and anatomical regions
during training. Our experiments on medical anomaly detection benchmarks
demonstrate that our method significantly surpasses current state-of-the-art
models, with an average AUC improvement of 6.24
classification, 2.03
and few-shot settings, respectively. Source code is available at:
https://github.com/MediaBrain-SJTU/MVFA-AD
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要