Statistical Consequences of using Multi-armed Bandits to Conduct Adaptive Educational Experiments

educational data mining(2019)

引用 23|浏览15
暂无评分
摘要
Randomized experiments can provide key insights for improving educational technologies, but many students may experience conditions associated with inferior learning outcomes in these experiments. Multiarmed bandit (MAB) algorithms can address this issue by accumulating evidence from the experiment as it runs and modifying the experimental design to assign more helpful conditions to a greater proportion of future students. Using simulations, we explore the statistical impact of using MAB algorithms for experiment design, focusing on the tradeoff between acquiring statistically reliable information from the experiment and benefits to students. We consider how temporal biases in patterns of student behavior may impact the results of MAB experiments, and model data from ten previous educational experiments to demonstrate potential impacts of MAB assignment. Results suggest that MAB experiments can lead to much higher average benefits to students than traditional experimental designs, although at least twice as many participants are needed for acceptable statistical power. Using an optimistic prior distribution for the MAB algorithm mitigates the loss in power to some extent, without significantly reducing benefits to students. Additionally, longer experiments with MAB assignment still assign fewer students to a less effective condition than typical practice of a shorter experiment followed by choosing one condition for all future students. Yet, MAB assignment does increase false positive rates, especially if there are temporal biases in when students enter the experiment. Caution must thus be used when interpreting results from MAB assignment in cases where students can choose when to participate in the experiment. Overall, in scenarios where student characteristics do not vary over time, MAB experimental designs can be beneficial for students and effective for reliably determining which of two differing conditions is better given large sample sizes.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要