A Novel Corpus of Annotated Medical Imaging Reports and Information Extraction Results Using BERT-based Language Models
CoRR(2024)
摘要
Medical imaging is critical to the diagnosis, surveillance, and treatment of
many health conditions, including oncological, neurological, cardiovascular,
and musculoskeletal disorders, among others. Radiologists interpret these
complex, unstructured images and articulate their assessments through narrative
reports that remain largely unstructured. This unstructured narrative must be
converted into a structured semantic representation to facilitate secondary
applications such as retrospective analyses or clinical decision support. Here,
we introduce the Corpus of Annotated Medical Imaging Reports (CAMIR), which
includes 609 annotated radiology reports from three imaging modality types:
Computed Tomography, Magnetic Resonance Imaging, and Positron Emission
Tomography-Computed Tomography. Reports were annotated using an event-based
schema that captures clinical indications, lesions, and medical problems. Each
event consists of a trigger and multiple arguments, and a majority of the
argument types, including anatomy, normalize the spans to pre-defined concepts
to facilitate secondary use. CAMIR uniquely combines a granular event structure
and concept normalization. To extract CAMIR events, we explored two BERT
(Bi-directional Encoder Representation from Transformers)-based architectures,
including an existing architecture (mSpERT) that jointly extracts all event
information and a multi-step approach (PL-Marker++) that we augmented for the
CAMIR schema.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要