Semi-supervised Text-based Person Search
CoRR(2024)
摘要
Text-based person search (TBPS) aims to retrieve images of a specific person
from a large image gallery based on a natural language description. Existing
methods rely on massive annotated image-text data to achieve satisfactory
performance in fully-supervised learning. It poses a significant challenge in
practice, as acquiring person images from surveillance videos is relatively
easy, while obtaining annotated texts is challenging. The paper undertakes a
pioneering initiative to explore TBPS under the semi-supervised setting, where
only a limited number of person images are annotated with textual descriptions
while the majority of images lack annotations. We present a two-stage basic
solution based on generation-then-retrieval for semi-supervised TBPS. The
generation stage enriches annotated data by applying an image captioning model
to generate pseudo-texts for unannotated images. Later, the retrieval stage
performs fully-supervised retrieval learning using the augmented data.
Significantly, considering the noise interference of the pseudo-texts on
retrieval learning, we propose a noise-robust retrieval framework that enhances
the ability of the retrieval model to handle noisy data. The framework
integrates two key strategies: Hybrid Patch-Channel Masking (PC-Mask) to refine
the model architecture, and Noise-Guided Progressive Training (NP-Train) to
enhance the training process. PC-Mask performs masking on the input data at
both the patch-level and the channel-level to prevent overfitting noisy
supervision. NP-Train introduces a progressive training schedule based on the
noise level of pseudo-texts to facilitate noise-robust learning. Extensive
experiments on multiple TBPS benchmarks show that the proposed framework
achieves promising performance under the semi-supervised setting.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要