General-Purpose vs. Domain-Adapted Large Language Models for Extraction of Structured Data from Chest Radiology Reports

Ali H. Dhanaliwala,Rikhiya Ghosh,Sanjeev Kumar Karn, Poikavila Ullaskrishnan,Oladimeji Farri,Dorin Comaniciu,Charles E. Kahn

arxiv（2023）

引用 0|浏览19

暂无评分

摘要

Radiologists produce unstructured data that can be valuable for clinical care when consumed by information systems. However, variability in style limits usage. Study compares system using domain-adapted language model (RadLing) and general-purpose LLM (GPT-4) in extracting relevant features from chest radiology reports and standardizing them to common data elements (CDEs). Three radiologists annotated a retrospective dataset of 1399 chest XR reports (900 training, 499 test) and mapped to 44 pre-selected relevant CDEs. GPT-4 system was prompted with report, feature set, value set, and dynamic few-shots to extract values and map to CDEs. Output key:value pairs were compared to reference standard at both stages and an identical match was considered TP. F1 score for extraction was 97 F1 score for mapping was 98 statistically significant (P<.001). RadLing's domain-adapted embeddings were better in feature extraction and its light-weight mapper had better f1 score in CDE assignment. RadLing system also demonstrated higher capabilities in differentiating between absent (99 RadLing system's domain-adapted embeddings helped improve performance of GPT-4 system to 92 operational advantages including local deployment and reduced runtime costs.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要