Divide-and-Merge the embedding space for cross-modality person search

Chengji Wang,Zhiming Luo,Zhun Zhong,Shaozi Li

Neurocomputing（2021）

引用 1|浏览21

暂无评分

摘要

This study considers the problem of text-based person search, which aims to find the corresponding person of a given text description in an image gallery. Existing methods usually learn a similarity mapping of local parts between image and text, or embed the whole image and text into a unified embedding space. However, the relevance between local and the whole is largely underexplored. In this paper, we design a Divide-and-Merge Embedding (DME) learning framework for text-based person search. DME explicitly 1) models the relations between local parts and global embedding, 2) incorporates local details into global embedding. Specifically, we design a Feature Dividing Network (FDN) to embed the input into K locally guided semantic representations by self-attentive embedding, each representation depicts a local part of the person. Then, we propose a Relevance based Subspace Projection (RSP) method for merging diverse local representations to a compact global embedding. RSP helps the model to obtain discriminative embedding by jointly minimizing the redundancy of local parts and maximizing the relevance between local parts and global embedding. Extensive experimental results on three challenging benchmarks, i.e., CUHK-PEDES, CUB and Flowers datasets, have demonstrated the effectiveness of the proposed method.

查看译文

关键词

Text-based person search,Convolutional Neural Network (CNN),Divide-and-Merge,Minimum Redundancy,Maximum Relevance

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要