S3-Net: A Self-Supervised Dual-Stream Network for Radiology Report Generation

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS(2024)

引用 0|浏览4
暂无评分
摘要
Intelligent medicine is eager to automatically generate radiology reports to ease the tedious work of radiologists. Previous researches mainly focused on the text generation with encoder-decoder structure, while CNN networks for visual features ignored the long-range dependencies correlated with textual information. Besides, few studies exploit cross-modal mappings to promote radiology report generation. To alleviate the above problems, we propose a novel end-to-end radiology report generation model dubbed Self-Supervised dual-Stream Network (S3-Net). Specifically, a Dual-Stream Visual Feature Extractor (DSVFE) composed of ResNet and SwinTransformer is proposed to capture more abundant and effective visual features, where the former focuses on local response and the latter explores long-range dependencies. Then, we introduced the Fusion Alignment Module (FAM) to fuse the dual-stream visual features and facilitate alignment between visual features and text features. Furthermore, the Self-Supervised Learning with Mask(SSLM) is introduced to further enhance the visual feature representation ability. Experimental results on two mainstream radiology reporting datasets (IU X-ray and MIMIC-CXR) show that our proposed approach outperforms previous models in terms of language generation metrics.
更多
查看译文
关键词
Feature extraction,Visualization,Radiology,Data mining,Self-supervised learning,Computational modeling,Task analysis,Radiology report generation,self-supervision,dual-stream network,feature fusion,cross-modal alignment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要