谷歌浏览器插件
订阅小程序
在清言上使用

A survey on multimodal bidirectional machine learning translation of image and natural language processing

Wongyung Nam,Beakcheol Jang

EXPERT SYSTEMS WITH APPLICATIONS(2024)

引用 1|浏览11
暂无评分
摘要
Advances in multimodal machine learning help artificial intelligence to resemble human intellect more closely, which perceives the world from multiple modalities. We surveyed state-of-the-art research on the modalities of bidirectional machine learning translation of image and natural language processing (NLP), which address a considerable proportion of human life. Recently, with the advances in deep learning model architectures and learning methods in the fields of image and NLP, considerable progress has been made in multimodal machine learning translations that can be built by integrating image and NLP. Our goal is to explore and summarize state-of-the-art research on multimodal machine learning translation and present a taxonomy for the multimodal bidirectional machine learning translation of image and NLP. Furthermore, we reviewed the evaluation metrics and compared state-of-the-art approaches that influences this field. We believe that this survey will become a cornerstone of future research by discussing the challenges in multimodal machine learning translation and direction of future research based on understanding state-of-the-art research in the field.
更多
查看译文
关键词
Computer vision and natural language,processing,Deep learning,Image captioning,Image synthesis,Machine learning,Multimodal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要