An Empirical Study on Ensemble Learning of Multimodal Machine Translation

2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM)(2020)

引用 4|浏览15
暂无评分
摘要
With the increasing availability of images, multimodal machine translation (MMT) is leading a vibrant field. Model structure and multimodal information introduction are the hotspot focused by MMT researchers nowadays. Among the existing models, transformer model has reached the state-of-the-art performance in many translation tasks. However, we observe that the performance of MMT based on transformer is highly unstable since transformer model is sensitive to the fluctuation of hyper-parameters especially the number of layers, the dimension of word embeddings and hidden states, the number of multi-heads. Moreover, different ways of introducing image information also have significant influence on the performance of MMT. In this paper, we exploit some integration strategies which depend on different tasks to make collaborative decisions on the final translation results to enhance the stability of MMT based on transformer. Furthermore, we combine different ways of introducing image information to improve the semantic expression of input. Extensive experiments on Multi30K dataset demonstrate that ensemble learning in MMT which integrates text and image features exactly obtain more stable and better translation performance and the best result yields improvement of 5.12 BLEU points over the strong Transformer baseline set in our experiments.
更多
查看译文
关键词
Multimodal machine translation,Transformer,Ensemble learning,Synonym-replacing,Deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要