Universal Weighting Metric Learning for Cross-Modal Matching

Jiwei Wei,Xing Xu,Yang Yang,Yanli Ji,Zheng Wang,Heng Tao Shen

CVPR（2020）

引用 89|浏览419

暂无评分

摘要

Cross-modal matching has been a highlighted research topic in both vision and language areas. Learning appropriate mining strategy to sample and weight informative pairs is crucial for the cross-modal matching performance. However, most existing metric learning methods are developed for unimodal matching, which is unsuitable for cross-modal matching on multimodal data with heterogeneous features. To address this problem, we propose a simple and interpretable universal weighting framework for cross-modal matching, which provides a tool to analyze the interpretability of various loss functions. Furthermore, we introduce a new polynomial loss under the universal weighting framework, which defines a weight function for the positive and negative informative pairs respectively. Experimental results on two image-text matching benchmarks and two video-text matching benchmarks validate the efficacy of the proposed method.

查看译文

关键词

weight informative pairs,cross-modal matching performance,unimodal matching,interpretable universal weighting framework,image-text matching benchmarks,video-text matching benchmarks,metric learning methods,loss functions,polynomial loss

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要