谷歌浏览器插件
订阅小程序
在清言上使用

Divide-and-Merge the embedding space for cross-modality person search

Neurocomputing(2021)

引用 1|浏览21
暂无评分
摘要
This study considers the problem of text-based person search, which aims to find the corresponding person of a given text description in an image gallery. Existing methods usually learn a similarity mapping of local parts between image and text, or embed the whole image and text into a unified embedding space. However, the relevance between local and the whole is largely underexplored. In this paper, we design a Divide-and-Merge Embedding (DME) learning framework for text-based person search. DME explicitly 1) models the relations between local parts and global embedding, 2) incorporates local details into global embedding. Specifically, we design a Feature Dividing Network (FDN) to embed the input into K locally guided semantic representations by self-attentive embedding, each representation depicts a local part of the person. Then, we propose a Relevance based Subspace Projection (RSP) method for merging diverse local representations to a compact global embedding. RSP helps the model to obtain discriminative embedding by jointly minimizing the redundancy of local parts and maximizing the relevance between local parts and global embedding. Extensive experimental results on three challenging benchmarks, i.e., CUHK-PEDES, CUB and Flowers datasets, have demonstrated the effectiveness of the proposed method.
更多
查看译文
关键词
Text-based person search,Convolutional Neural Network (CNN),Divide-and-Merge,Minimum Redundancy,Maximum Relevance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要