Appropriateness of Ophthalmology Recommendations From an Online Chat-Based Artificial Intelligence Model

Prashant D. Tailor,Timothy T. Xu,Blake H. Fortes,Raymond Iezzi,Timothy W. Olsen,Matthew R. Starr,Sophie J. Bakri, Brittni A. Scruggs,Andrew J. Barkmeier,Sanjay V. Patel,Keith H. Baratz, Ashlie A. Bernhisel,Lilly H. Wagner,Andrea A. Tooley,Gavin W. Roddy,Arthur J. Sit, Kristi Y. Wu,Erick D. Bothun,Sasha A. Mansukhani,Brian G. Mohney

Mayo Clinic Proceedings: Digital Health（2024）

引用 0|浏览2

暂无评分

摘要

Objective To determine the appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model to ophthalmology questions. Patients and Methods Cross-sectional qualitative study from April 1, 2023, to April 30, 2023. A total of 192 questions were generated spanning all ophthalmic subspecialties. Each question was posed to a large language model (LLM) 3 times. The responses were graded by appropriate subspecialists as appropriate, inappropriate, or unreliable in 2 grading contexts. The first grading context was if the information was presented on a patient information site. The second was an LLM-generated draft response to patient queries sent by the electronic medical record (EMR). Appropriate was defined as accurate and specific enough to serve as a surrogate for physician-approved information. Main outcome measure was percentage of appropriate responses per subspecialty. Results For patient information site-related questions, the LLM provided an overall average of 79% appropriate responses. Variable rates of average appropriateness were observed across ophthalmic subspecialties for patient information site information ranging from 56% to 100%: cataract or refractive (92%), cornea (56%), glaucoma (72%), neuro-ophthalmology (67%), oculoplastic or orbital surgery (80%), ocular oncology (100%), pediatrics (89%), vitreoretinal diseases (86%), and uveitis (65%). For draft responses to patient questions via EMR, the LLM provided an overall average of 74% appropriate responses and varied by subspecialty: cataract or refractive (85%), cornea (54%), glaucoma (77%), neuro-ophthalmology (63%), oculoplastic or orbital surgery (62%), ocular oncology (90%), pediatrics (94%), vitreoretinal diseases (88%), and uveitis (55%). Stratifying grades across health information categories (disease and condition, risk and prevention, surgery-related, and treatment and management) showed notable but insignificant variations, with disease and condition often rated highest (72% and 69%) for appropriateness and surgery-related (55% and 51%) lowest, in both contexts. Conclusion This LLM reported mostly appropriate responses across multiple ophthalmology subspecialties in the context of both patient information sites and EMR-related responses to patient questions. Current LLM offerings require optimization and improvement before widespread clinical use.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要