ChatGPT takes on the european exam in core cardiology: an AI success story

I. Skalidis, A. Cagnina, O. Luangphiphat,O. Muller,E. Abbe,S. Fournier

European Heart Journal(2023)

引用 0|浏览10
暂无评分
摘要
Abstract Background ChatGPT, the trending novel artificial intelligence has triggered ongoing debate regarding its capabilities.Recently, preliminary reports showed that answered correctly in the majority of questions of the United States Medical License Examinations (USMLE). However, its ability to succeed a more precise, challenging and high-stakes post-graduate test, such as the final exam for the completion of medical residency , like the European Exam in Core Cardiology (EECC), is not known yet. Purpose We sought to evaluate the performance of ChatGPT on EECC, to test its capability on a more demanding, high-stakes post-graduate exam in Cardiology training. Methods A total of 488 publicly-available single-answer multiple choice questions (MCQs) were randomly obtained from three different MCQs sources that are traditionally used for the preparation for the EECC: 88 from the sample exam questions released since 2018 from the official ESC website, 200 from the 2022 edition of StudyPRN and 200 from the Braunwald's Heart Disease Review and Assessment (BHDRA). Questions containing audio or visual assets were excluded. After filtering, 362 MCQ items (ESC sample: 68, BHDRA:150, StudyPRN: 144) were included and considered as input source. False responses and indeterminate responses were considered as not correct. Results ChatGPT answered to 340 questions out of 362, with 22 indeterminate answers in total. The overall accuracy was 58.8% across all the question sources. More specifically, it demonstrated an accuracy for the ESC sample, BHDRA and StudyPRN of 61.7%, 52.6%, 63.8% respectively. It answered correctly 42/68 (4 indeterminate) of ESC sample questions, 79/150 (11 indeterminate) of the BHDRA and 92/144 (7 indeterminate) of the StudyPRN. Conclusion ChatGPT manages to correctly answer the majority of EECC’s questions and perform within the passing threshold range. Although it cannot yet process visual content, it can provide rational and correct answers to text-based inputs in most scenarios. The model may be able to efficiently handle a massive amount of acquired medical knowlededge, but the current approach may not substitue for critical thinkg, innovation and creativity; some of the key attibutes that doctors are expected to showcase.Performance of ChatGPT on EECCExample of MCQ input at ChatGPT
更多
查看译文
关键词
core cardiology,european exam,chatgpt,ai
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要