Continual Learning for Instruction Following from Realtime Feedback.

CoRR(2022)

Cited 9|Views81
No score
Abstract
We study the problem of continually training an instruction-following agent through feedback provided by users during collaborative interactions. During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent's instruction execution. We cast learning as a contextual bandit problem, converting the user feedback to immediate reward. We evaluate through multiple rounds of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution over time. We also show our approach is robust to several design variations, and that the feedback signal is roughly equivalent to the learning signal of supervised demonstration data.
More
Translated text
Key words
learning,instruction,feedback
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined