Hierarchical Multimodal Metric Learning For Multimodal Classification
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017)(2017)
摘要
Multimodal classification arises in many computer vision tasks such as object classification and image retrieval. The idea is to utilize multiple sources (modalities) measuring the same instance to improve the overall performance compared to using a single source (modality). The varying characteristics exhibited by multiple modalities make it necessary to simultaneously learn the corresponding metrics. In this paper, we propose a multiple metrics learning algorithm for multimodal data. Metric of each modality is a product of two matrices: one matrix is modality specific, the other is enforced to be shared by all the modalities. The learned metrics can improve multimodal classification accuracy and experimental results on four datasets show that the proposed algorithm outperforms existing learning algorithms based on multiple metrics as well as other approaches tested on these datasets. Specifically, we report 95.0% object instance recognition accuracy, 89.2% object category recognition accuracy on the multi-view RGB-D dataset and 52.3% scene category recognition accuracy on SUN RGB-D dataset.
更多查看译文
关键词
multimodal data,multimodal classification accuracy,hierarchical multimodal metric learning,computer vision tasks,object instance recognition accuracy,multi-view RGB-D dataset,scene category recognition accuracy,SUN RGB-D dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络