Multi-Level Multimodal Transformer Network for Multimodal Recipe Comprehension

Ao Liu
Ao Liu
Shuai Yuan
Shuai Yuan
Chenbin Zhang
Chenbin Zhang
Congjian Luo
Congjian Luo
Yaqing Liao
Yaqing Liao
Kun Bai
Kun Bai

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020, pp. 1781-1784, 2020.

Cited by: 0|Bibtex|Views30|DOI:https://doi.org/10.1145/3397271.3401247
EI
Other Links: dl.acm.org|dblp.uni-trier.de|academic.microsoft.com

Abstract:

Multimodal Machine Comprehension ($\rm M^3C$) has been a challenging task that requires understanding both language and vision, as well as their integration and interaction. For example, the RecipeQA challenge, which provides several $\rm M^3C$ tasks, requires deep neural models to understand textual instructions, images of different step...More

Code:

Data:

Your rating :
0

 

Tags
Comments