Multi-Level Multimodal Transformer Network for Multimodal Recipe Comprehension
SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020, pp. 1781-1784, 2020.
Multimodal Machine Comprehension ($\rm M^3C$) has been a challenging task that requires understanding both language and vision, as well as their integration and interaction. For example, the RecipeQA challenge, which provides several $\rm M^3C$ tasks, requires deep neural models to understand textual instructions, images of different step...More
Full Text (Upload PDF)
PPT (Upload PPT)