Dynamic interaction networks for image-text multimodal learning

Neurocomputing(2020)

引用 4|浏览73
暂无评分
摘要
Recently, there is a surge of interest in image-text multimodal representation learning, and many neural network based models have been proposed aiming to capture the interaction between two modalities with different forms of functions. Despite their success, a potential limitation of these methods is insufficient to model all kinds of interactions with a set of static parameters. To alleviate this problem, we present a dynamic interaction network, in which the parameters of the interaction function are dynamically generated by a meta network. Additionally, to provide necessary multimodal features that the meta network needs, we propose a new neural module called Multimodal Transformer. Experimentally, we not only make a comprehensively quantitative evaluation on four image-text tasks, but also show some interpretable analyses of our models, revealing the internal working mechanism of the dynamic parameter learning.
更多
查看译文
关键词
Multimodal learning,Dynamic parameters prediction,Deep neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要