Multi-label double-layer learning for cross-modal retrieval.

NEUROCOMPUTING(2018)

引用 5|浏览64
暂无评分
摘要
This paper proposes a novel method named Multi-label Double-layer Learning (MDLL) for multi-label cross-modal retrieval task. MDLL includes two stages (layers): L2C (Label to Common) and C2L (Common to Label). In the L2C stage, considering that labels can provide semantic information, we take label information as an auxiliary modality and apply a covariance matrix to represent label similarity in multi-label situation. Thus we can maximize the correlation of different modalities and reduce their semantic gap in the L2C stage. In addition, we find that samples with the same semantic labels may have different contents from users' view. According to this problem, in the C2L stage, labels are projected to a latent space learned from features of image and text. By this way, the label latent space are more related to the sample's contents. Then, it is noticed that the samples have same labels but various contents can be decreased. In MDLL, iterative learning of the L2C and C2L stages will improve the discriminative ability greatly and decline the discrepancy between the labels and the contents. To show the effectiveness of MDLL, some experiments are conducted on three multi-label cross-modal retrieval tasks (Pascal Voc 2007, Nus-wide, and LabelMe), on which competitive results are obtained. (C) 2017 Elsevier B.V. All rights reserved.
更多
查看译文
关键词
Cross-modal retrieval,Multi-label,Multimedia,Partial least squares
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要