Multi-level Cross-modal Alignment for Image Clustering
CoRR(2024)
摘要
Recently, the cross-modal pretraining model has been employed to produce
meaningful pseudo-labels to supervise the training of an image clustering
model. However, numerous erroneous alignments in a cross-modal pre-training
model could produce poor-quality pseudo-labels and degrade clustering
performance. To solve the aforementioned issue, we propose a novel
Multi-level Cross-modal Alignment method to improve the alignments in
a cross-modal pretraining model for downstream tasks, by building a smaller but
better semantic space and aligning the images and texts in three levels, i.e.,
instance-level, prototype-level, and semantic-level. Theoretical results show
that our proposed method converges, and suggests effective means to reduce the
expected clustering risk of our method. Experimental results on five benchmark
datasets clearly show the superiority of our new method.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要