Using Learning Curve Predictions to Learn from Incorrect Feedback.

Taylor A. Kessler Faulkner,Andrea Lockerd Thomaz

ICRA（2023）

Cited 1|Views17

No score

Abstract

Robots can incorporate data from human teachers when learning new tasks. However, this data can often be noisy, which can cause robots to learn slowly or not at all. One method for learning from human teachers is Human-in-the-loop Reinforcement Learning (HRL), which can combine information from both an environmental reward and external feedback from human teachers. However, many HRL methods assume near-perfect information from teachers or must know the skill level of each teacher before starting the learning process. Our algorithm, Classification for Learning Erroneous Assessments using Rewards (CLEAR), is a feedback filter for Reinforcement Learning (RL) algorithms, enabling learning agents to learn from imperfect teachers without prior modeling. CLEAR is able to determine whether human feedback is correct based on observations of the RL learning curve. Our results suggest that CLEAR improves the quality of human feedback - from 57.5% to 65% correct in a human study - and performs more reliably than baselines by matching or outperforming RL without human teachers in all tested cases.

Translated text

Key words

classification for learning erroneous assessments using rewards,CLEAR,HRL,human-in-the-loop reinforcement learning,incorrect feedback,learning curve predictions,reinforcement learning algorithms,RL learning curve

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined