Multimodal Object Classification Using Bidirectional Gated Recurrent Unit Networks.

DSC(2018)

引用 2|浏览0
暂无评分
摘要
In the field of artificial intelligence, we have observed a strong trend towards introducing external resources to aid pattern recognition tasks. In this work, we utilize external meta-data in the form of web text, i.e. natural language, to aid image classification. We adopt pre-trained convolutional neural network (CNN) to learn the mid-level representation of images, meanwhile we model the text as sequential data using bidirectional gated recurrent unit (BGRU) encoders. To address the heterogeneity of the two different representations, we employ a neural network to learn a shared multimodal representation. Experimental results show that the multimodal CNN-BGRU model achieves remarkable classification performance on the large-scale Pascal VOC-2007 and VOC-2012 datasets.
更多
查看译文
关键词
Multimodal analysis, object classification, bidirectional recurrent neural network, gated recurrent units
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要