Zero shot learning of hint policy via reinforcement learning and program synthesis

Aleksandr Efremov,Ahana Ghosh,Adish Singla

EDM（2020）

引用 1|浏览24

暂无评分

摘要

Intelligent tutoring systems for programming education can support students by providing personalized feedback when a student is stuck in a coding task. We study the problem of designing a hint policy to provide a next-step hint to students from their current partial solution, eg, which line of code should be edited next. The state of the art techniques for designing a hint policy use supervised learning approach, however, require access to historical student data containing trajectories of partial solutions written when solving the task successfully. These techniques are limited in applicability when needed to provide feedback for a new task without any available data, or to a new student whose trajectory of partial solutions is very different from that seen in historical data. To this end, we tackle the zero-shot challenge of learning a hint policy to be able to assist the very first student who is solving a task, without relying on any data. We propose a novel reinforcement learning (RL) framework to solve the challenge by leveraging recent advancements in RL-based neural program synthesis. Our framework is modular and amenable to several extensions, such as designing appropriate reward functions for adding a desired feature in the type of provided hints and allowing to incorporate student data from the same or related tasks to further boost the performance of the hint policy. We demonstrate the effectiveness of our RL-based hint policy on a publicly available dataset from Code. org, the world’s largest programming education platform.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要