“As a Radiologist”, ChatGPT-4 Gives Better Recommendations to Common Questions of Breast, Lung and Prostate Cancer: Comparison of Results (Preprint)

Haidi Lu,Qian Zhan, Mengxing Song, Qiaoling Chen, Hanchang Wu,Luguang Chen,Jianping Lu,Chengwei Shao,Chao Ma

crossref（2024）

引用 0|浏览1

暂无评分

摘要

BACKGROUND Large language models, similar to ChatGPT, potentially offer both advantages and challenges when tasked with answering disease-related questions. It is valuable to investigate whether assigning a specific role to these large language models, such as simulating a radiologist, could lead to more appropriate responses. OBJECTIVE To evaluate and compare the accuracy of ChatGPT-4 with the role of a radiologist (ChatGPT-4R) in answering questions related to breast, lung, and prostate cancer with the direct responses provided by ChatGPT-4. METHODS The study utilized 25, 40, and 22 common questions pertinent to breast, lung, and prostate cancer, respectively. These questions were posed to ChatGPT-4 and ChatGPT-4R to yield responses. Subsequently, five radiologists reviewed each question and classified the derived answers into three categories: correct, partially correct, or incorrect. The accuracy of the responses was evaluated employing McNemar tests. RESULTS The analysis of responses related to breast, lung, and prostate cancer showed that ChatGPT-4R answered with an accuracy of 96%, 87.5%, and 100%, respectively. On the other hand, ChatGPT-4's accuracy was 96%, 72.5%, and 95.5% for the same categories. Across all 87 questions, ChatGPT-4R achieved 93.1% correct responses, 4.6% partially correct responses, and 2.3% incorrect responses. In comparison, ChatGPT-4R was more likely to provide correct answers than ChatGPT-4, with a significant difference of 8.0% (P = .02). CONCLUSIONS The performance of ChatGPT-4R exceeds that of ChatGPT-4 in terms of accuracy. However, it remains incapable of providing correct answers to all posed questions. CLINICALTRIAL N/A

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要