# Multi-Level Multimodal Transformer Network for Multimodal Recipe Comprehension

Ao Liu
Shuai Yuan
Chenbin Zhang
Congjian Luo
Yaqing Liao
Kun Bai

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020, pp. 1781-1784, 2020.

Abstract:

Multimodal Machine Comprehension ($\rm M^3C$) has been a challenging task that requires understanding both language and vision, as well as their integration and interaction. For example, the RecipeQA challenge, which provides several $\rm M^3C$ tasks, requires deep neural models to understand textual instructions, images of different step...More

