From F To A On The New York Regents Science Exams - An Overview Of The Aristo Project

Peter Clark,Oren Etzioni,Daniel Khashabi,Tushar Khot,Bhavana Dalvi Mishra,Kyle Richardson,Ashish Sabharwal, Carissa Schoenick,Oyvind Tafjord,Niket Tandon,Sumithra Bhakthavatsalam,Dirk Groeneveld,Michal Guerquin, Michael Schmitz

AI Mag.（2020）

引用 89|浏览229

暂无评分

摘要

Artificial intelligence has achieved remarkable mastery over games such as Chess, Go, and poker, and even Jeopardy; but the rich variety of standardized exams has remained a landmark challenge. Even as recently as 2016, the best artificial intelligence system could only achieve 59.3 percent on an eighth-grade science exam (Schoenick et al. 2017). This article reports success on the Grade 8 New York Regents Science Exam, where, for the first time, a system scores more than ninety percent on the exam's nondiagram, multiple-choice questions. In addition, our Aristo system, building upon the success of recent language models, exceeded eighty-three percent on the corresponding Grade 12 Science Exam's non-diagram, multiple-choice questions. The results, on unseen test questions, are robust across different test years and different variations of this kind of test. They demonstrate that modern natural language processing methods can result in mastery on this task. While not a full solution to general question answering (the questions are limited to eighth-grade multiple-choice science), it represents a significant milestone for the field.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要