Multimodal Object Classification Using Bidirectional Gated Recurrent Unit Networks.

DSC（2018）

引用 2|浏览0

暂无评分

摘要

In the field of artificial intelligence, we have observed a strong trend towards introducing external resources to aid pattern recognition tasks. In this work, we utilize external meta-data in the form of web text, i.e. natural language, to aid image classification. We adopt pre-trained convolutional neural network (CNN) to learn the mid-level representation of images, meanwhile we model the text as sequential data using bidirectional gated recurrent unit (BGRU) encoders. To address the heterogeneity of the two different representations, we employ a neural network to learn a shared multimodal representation. Experimental results show that the multimodal CNN-BGRU model achieves remarkable classification performance on the large-scale Pascal VOC-2007 and VOC-2012 datasets.

查看译文

关键词

Multimodal analysis, object classification, bidirectional recurrent neural network, gated recurrent units

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要