Chrome Extension
WeChat Mini Program
Use on ChatGLM

Attention Alignment Multimodal LSTM for Fine-Gained Common Space Learning.


Cited 13|Views14
No score
We address the problem common space learning approach that maps all related multimodal information into a common space for multimodal data. To establish a fine-grained common space, the aligned relevant local information of different modalities is used to learn a common subspace where the projected fragmented information is further integrated according to intra-modal semantic relationships. Specifically, we propose a novel multimodal LSTM with an attention alignment mechanism, namely attention alignment multimodal LSTM (AAM-LSTM), which mainly includes attentional alignment recurrent network (AA-R) and hierarchical multimodal LSTM (HM-LSTM). Different from the traditional methods which operate on the full modal data directly, the proposed model exploits the inter-modal and intra-modal semantic relationships of local information, to jointly establish a uniform representation of multi-modal data. Specifically, AA-R automatically captures semantic-aligned local information to learn common subspace without the need of supervised labels, then HM-LSTM leverages the potential relationships of these local information to learn a fine-grained common space. The experimental results on Filker30K, Filker8K,and Filker30K entities verify the performance and effectiveness of our model, which compares favorably with the state-of-the-art methods. In particular, the experiment of phrase localization on AA-R with Filker30K entities shows the expected accurate attention alignment. Moreover, from the experiment results of image-sentence retrieval tasks, it can be concluded that the proposed AAM-LSTM outperforms benchmark algorithms by a large margin.
Translated text
Key words
Multimodal data fusion,phrase localization,fine-grained common space,attention alignment,hierarchical multimodal LSTM
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined