Multi-Level Multimodal Transformer Network for Multimodal Recipe Comprehension
SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020, pp. 1781-1784, 2020.
EI
Abstract:
Multimodal Machine Comprehension ($\rm M^3C$) has been a challenging task that requires understanding both language and vision, as well as their integration and interaction. For example, the RecipeQA challenge, which provides several $\rm M^3C$ tasks, requires deep neural models to understand textual instructions, images of different step...More
Code:
Data:
Tags
Comments