General-Purpose vs. Domain-Adapted Large Language Models for Extraction of Structured Data from Chest Radiology Reports
arxiv(2023)
摘要
Radiologists produce unstructured data that can be valuable for clinical care
when consumed by information systems. However, variability in style limits
usage. Study compares system using domain-adapted language model (RadLing) and
general-purpose LLM (GPT-4) in extracting relevant features from chest
radiology reports and standardizing them to common data elements (CDEs). Three
radiologists annotated a retrospective dataset of 1399 chest XR reports (900
training, 499 test) and mapped to 44 pre-selected relevant CDEs. GPT-4 system
was prompted with report, feature set, value set, and dynamic few-shots to
extract values and map to CDEs. Output key:value pairs were compared to
reference standard at both stages and an identical match was considered TP. F1
score for extraction was 97
F1 score for mapping was 98
statistically significant (P<.001). RadLing's domain-adapted embeddings were
better in feature extraction and its light-weight mapper had better f1 score in
CDE assignment. RadLing system also demonstrated higher capabilities in
differentiating between absent (99
RadLing system's domain-adapted embeddings helped improve performance of GPT-4
system to 92
operational advantages including local deployment and reduced runtime costs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要