A Comprehensive Study of GPT-4V’s Multimodal Capabilities in Medical Imaging

Yingshu Li, Yunyi Liu,Zhanyu Wang, Xinyu Li,Lingqiao Liu, Yuan Zhan, Leyang Cui,Zhaopeng Tu,Longyue Wang,Luping Zhou

medRxiv (Cold Spring Harbor Laboratory)（2023）

引用 0|浏览0

暂无评分

摘要

A bstract This paper presents a comprehensive evaluation of GPT-4V’s capabilities across diverse medical imaging tasks, including Radiology Report Generation, Medical Visual Question Answering (VQA), and Visual Grounding. While prior efforts have explored GPT-4V’s performance in medical imaging, to the best of our knowledge, our study represents the first quantitative evaluation on publicly available benchmarks. Our findings highlight GPT-4V’s potential in generating descriptive reports for chest X-ray images, particularly when guided by well-structured prompts. However, its performance on the MIMIC-CXR dataset benchmark reveals areas for improvement in certain evaluation metrics, such as CIDEr. In the domain of Medical VQA, GPT-4V demonstrates proficiency in distinguishing between question types but falls short of prevailing benchmarks in terms of accuracy. Furthermore, our analysis finds the limitations of conventional evaluation metrics like the BLEU score, advocating for the development of more semantically robust assessment methods. In the field of Visual Grounding, GPT-4V exhibits preliminary promise in recognizing bounding boxes, but its precision is lacking, especially in identifying specific medical organs and signs. Our evaluation underscores the significant potential of GPT-4V in the medical imaging domain, while also emphasizing the need for targeted refinements to fully unlock its capabilities.

查看译文

关键词

medical imaging,multimodal capabilities

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要