Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction
EMNLP, pp. 2667-2678, 2018.
We propose to decompose instruction execution to goal prediction and action generation. We design a model that maps raw visual observations to goals using LINGUNET, a language-conditioned image generation network, and then generates the actions required to complete them. Our model is trained from demonstration only without external resour...More
PPT (Upload PPT)