Multi-modal Cooking Workflow Construction for Food Recipes

Liangming Pan,Jingjing Chen,Jianlong Wu,Shaoteng Liu,Chong-Wah Ngo,Min-Yen Kan,Yu-Gang Jiang,Tat-Seng Chua

MM '20: The 28th ACM International Conference on Multimedia Seattle WA USA October, 2020（2020）

引用 22|浏览288

暂无评分

摘要

Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps are multi-modal, featuring both text instructions and cooking images. We then propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow, which achieved over 20% performance gain over existing hand-crafted baselines.

查看译文

关键词

Food Recipes, Cooking Workflow, Multi-modal Fusion, MM-Res Dataset, Cause-and-Effect Reasoning, Deep Learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要