Experience
Education
Bio
My current research direction centers around Recursive Reward Modeling, a scalable technique for training RL agents from human feedback that involves breaking the evaluation of individual tasks down recursively until they can be solved directly with reward modeling.