Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval

Kaiyue Pang,Yongxin Yang,Timothy M. Hospedales,Tao Xiang,Yi-Zhe Song

CVPR（2020）

引用 89|浏览358

暂无评分

摘要

ImageNet pre-training has long been considered crucial by the fine-grained sketch-based image retrieval (FG-SBIR) community due to the lack of large sketch-photo paired datasets for FG-SBIR training. In this paper, we propose a self-supervised alternative for representation pre-training. Specifically, we consider the jigsaw puzzle game of recomposing images from shuffled parts. We identify two key facets of jigsaw task design that are required for effective FG-SBIR pre-training. The first is formulating the puzzle in a mixed-modality fashion. Second we show that framing the optimisation as permutation matrix inference via Sinkhorn iterations is more effective than the common classifier formulation of Jigsaw self-supervision. Experiments show that this self-supervised pre-training strategy significantly outperforms the standard ImageNet-based pipeline across all four product-level FG-SBIR benchmarks. Interestingly it also leads to improved cross-category generalisation across both pre-train/fine-tune and fine-tune/testing stages.

查看译文

关键词

mixed-modal Jigsaw puzzle,ImageNet pre-training,fine-grained sketch-based image retrieval,sketch-photo paired datasets,FG-SBIR training,self-supervised alternative,representation pre-training,jigsaw puzzle game,jigsaw task design,mixed-modality fashion,Jigsaw self-supervision,self-supervised pre-training,FG-SBIR pre-training,permutation matrix inference,Sinkhorn iterations,ImageNet classifier

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要