Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
EMNLP, pp. 1004-1015, 2017.
We propose to directly map raw visual observations and text input to actions for instruction execution. While existing approaches assume access to structured environment representations or use a pipeline of separately trained models, we learn a single model to jointly reason about linguistic and visual input. We use reinforcement learning...More
PPT (Upload PPT)