Composed Image Retrieval via Explicit Erasure and Replenishment With Semantic Alignment

IEEE TRANSACTIONS ON IMAGE PROCESSING(2022)

引用 3|浏览20
暂无评分
摘要
Composed image retrieval aims at retrieving the desired images, given a reference image and a text piece. To handle this task, two important subprocesses should be modeled reasonably. One is to erase irrelated details of the reference image against the text piece, and the other is to replenish the desired details in the image against the text piece. Nowadays, the existing methods neglect to distinguish between the two subprocesses and implicitly put them together to solve the composed image retrieval task. To explicitly and orderly model the two subprocesses of the task, we propose a novel composed image retrieval method which contains three key components, i.e., Multi-semantic Dynamic Suppression module (MDS), Text-semantic Complementary Selection module (TCS), and Semantic Space Alignment constraints (SSA). Concretely, MDS is to erase irrelated details of the reference image by suppressing its semantic features. TCS aims to select and enhance the semantic features of the text piece and then replenish them to the reference image. In the end, to facilitate the erasure and replenishment subprocesses, SSA aligns the semantics of the two modality features in the final space. Extensive experiments on three benchmark datasets (Shoes, FashionIQ, and Fashion200K) show the superior performance of our approach against state-of-the-art methods.
更多
查看译文
关键词
Semantics,Image retrieval,Task analysis,Kernel,Visualization,Manuals,Fuses,Composed image retrieval,multi-modal representation learning,multi-modal retrieval,embedding fusion,image retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要