Clinical Knowledge and Reasoning Abilities of Large Language Models in Pharmacy: A Comparative Study on the NAPLEX Exam.

2023 Tenth International Conference on Social Networks Analysis, Management and Security (SNAMS)(2023)

引用 1|浏览2
暂无评分
摘要
This study aims to evaluate the capabilities and limitations of three large language models (LLMs)-GPT-3, GPT-4, and Bard, in the field of pharmacy by assessing their reasoning abilities on a sample of the North American Pharmacist Licensure Examination (NAPLEX). Additionally, we explore the potential impacts of LLMs on pharmacy education and practice. To evaluate the LLMs, we utilized the sample of the NAPLEX exam comprising 137 multiple-choice questions. These questions were presented to GPT-3, GPT-4, and Bard through their respective user interfaces, and the answers generated by the LLMs were subsequently compared with the answer key. The results reveal a notable disparity in the performance of the LLMs. GPT-4 emerged as the top performer, accurately answering 78.8% of the questions. This marked a substantial 11% and 27.7% improvement over Bard and GPT-3, respectively. However, when considering questions that required multiple selections, the performance of each LLM decreased significantly. GPT-4, GPT-3, and Bard could only correctly respond to 53.6%, 13.9%, and 21.4% of such questions, respectively. Among the three LLMs evaluated, GPT-4 was the only model capable of passing the NAPLEX exam. Nevertheless, given the continuous evolution of LLMs, it is reasonable to anticipate that future models will effortlessly excel in this context. This highlights the significant potential of LLMs to influence the field of pharmacy. Hence, we must evaluate both the positive and negative implications associated with the integration of LLMs in pharmacy education and practice.
更多
查看译文
关键词
Artificial Intelligence,LLM,ChatGPT,Bard,Healthcare,Pharmacy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要