CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification
arxiv(2024)
摘要
Alzheimer's disease (AD) is an incurable neurodegenerative condition leading
to cognitive and functional deterioration. Given the lack of a cure, prompt and
precise AD diagnosis is vital, a complex process dependent on multiple factors
and multi-modal data. While successful efforts have been made to integrate
multi-modal representation learning into medical datasets, scant attention has
been given to 3D medical images. In this paper, we propose Contrastive Masked
Vim Autoencoder (CMViM), the first efficient representation learning method
tailored for 3D multi-modal data. Our proposed framework is built on a masked
Vim autoencoder to learn a unified multi-modal representation and
long-dependencies contained in 3D medical images. We also introduce an
intra-modal contrastive learning module to enhance the capability of the
multi-modal Vim encoder for modeling the discriminative features in the same
modality, and an inter-modal contrastive learning module to alleviate
misaligned representation among modalities. Our framework consists of two main
steps: 1) incorporate the Vision Mamba (Vim) into the mask autoencoder to
reconstruct 3D masked multi-modal data efficiently. 2) align the multi-modal
representations with contrastive learning mechanisms from both intra-modal and
inter-modal aspects. Our framework is pre-trained and validated ADNI2 dataset
and validated on the downstream task for AD classification. The proposed CMViM
yields 2.7% AUC performance improvement compared with other state-of-the-art
methods.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要