Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation
arxiv(2024)
摘要
The scaling laws and extraordinary performance of large foundation models
motivate the development and utilization of such models in biomedicine.
However, despite early promising results on some biomedical benchmarks, there
are still major challenges that need to be addressed before these models can be
used in real-world clinics. Frontier general-domain models such as GPT-4V still
have significant performance gaps in multimodal biomedical applications. More
importantly, less-acknowledged pragmatic issues, including accessibility, model
cost, and tedious manual evaluation make it hard for clinicians to use
state-of-the-art large models directly on private patient data. Here, we
explore training open-source small multimodal models (SMMs) to bridge
competency gaps for unmet clinical needs in radiology. To maximize data
efficiency, we adopt a modular approach by incorporating state-of-the-art
pre-trained models for image and text modalities, and focusing on training a
lightweight adapter to ground each modality to the text embedding space, as
exemplified by LLaVA-Med. For training, we assemble a large dataset of over 697
thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a
GPT-4-based metric for factuality evaluation, and demonstrate its parity with
expert evaluation. For best practice, we conduct a systematic ablation study on
various choices in data engineering and multimodal training. The resulting
LlaVA-Rad (7B) model attains state-of-the-art results on standard radiology
tasks such as report generation and cross-modal retrieval, even outperforming
much larger models such as GPT-4V and Med-PaLM M (84B). The inference of
LlaVA-Rad is fast and can be performed on a single V100 GPU in private
settings, offering a promising state-of-the-art tool for real-world clinical
applications.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要