Fully automatic summarization of radiology reports using natural language processing with large language models

Mizuho Nishio, Takaaki Matsunaga,Hidetoshi Matsuo,Munenobu Nogami,Yasuhisa Kurata,Koji Fujimoto,Osamu Sugiyama,Toshiaki Akashi,Shigeki Aoki,Takamichi Murakami

Informatics in Medicine Unlocked（2024）

引用 0|浏览4

暂无评分

摘要

Purpose Natural language processing using language models has yielded promising results in various fields. Language models can help improve the workflow of radiologists. This retrospective study aimed to construct and evaluate language models for automatic summarization of radiology reports. Methods Two radiology report datasets from the MIMIC Chest X-ray (MIMIC-CXR) database and the Japan Medical Image Database (JMID) were included in this study. The MIMIC-CXR is an open database comprising chest radiograph reports. The JMID is a large database comprising computed tomography and magnetic resonance imaging reports from 10 academic medical centers in Japan. A total of 128,032 and 1,101,271 reports were included in this study from the MIMIC-CXR database and JMID, respectively. Four Text-to-Text Transfer Transformer (T5) models were constructed. Recall-Oriented Understudy for Gisting Evaluation (ROUGE), a quantitative metric, was used to evaluate the quality of the text summarized from 19,205 and 58,043 test sets from the MIMIC-CXR and JMID, respectively. The Wilcoxon signed-rank test was used to evaluate the differences among the ROUGE values of the four T5 models. Moreover, the subsets of automatically summarized text in the test sets were manually evaluated by two radiologists. The best T5 models were selected for automatic summarization using the Wilcoxon signed-rank test. Results The quantitative metrics of the best T5 models were as follows: ROUGE-1 = 57.75 ± 30.99, ROUGE-2 = 49.96 ± 35.36, and ROUGE-L = 54.07 ± 32.48 in the MIMIC-CXR; and ROUGE-1 = 50.00 ± 29.24, ROUGE-2 = 39.66 ± 30.21, and ROUGE-L = 47.87 ± 29.44 in the JMID. The radiologists’ evaluations revealed 86% and 85% of the texts automatically summarized from the MIMIC-CXR and JMID, respectively, to be clinically useful. Conclusion The T5 models constructed in this study were able to perform automatic summarization of the radiology reports. The radiologists’ evaluations demonstrated most of the automatically summarized texts to be clinically valuable.

查看译文

关键词

Transformer,Natural language processing,Radiology report,Summarization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要