Visual-Textual Cross-Modal Interaction Network for Radiology Report Generation

Wenfeng Zhang, Baoning Cai, Jianming Hu,Qibing Qin,Kezhen Xie

IEEE SIGNAL PROCESSING LETTERS（2024）

引用 0|浏览2

暂无评分

摘要

The radiology report generation task generates diagnostic descriptions from radiology images, aiming to alleviate the onerous task for radiologists and alerting them to abnormalities. However, the data bias problem poses a persistent challenge, since the abnormal regions usually occupy a small portion of radiology image, while the report generation process should pay greater attention to the abnormal regions. Moreover, the data volume is relatively small compared to large language models, posing challenges during training. To address these issues effectively, we propose a Visual-textual Cross-model Interaction Network (VCIN) to enhance the quality of generated reports. VCIN comprises two key modules: Abundant Clinical Information Embedding (ACIE), which gathers rich cross-modal interaction information to promote the report generation of abnormal regions; and a Bert-based Decoder-only Generator (BDG), built on Bert architecture to mitigate training difficulties. The superior performance of our proposed model is demonstrated through experimental results obtained from two public benchmark datasets.

查看译文

关键词

Abundant clinical information,cross-modal interaction,radiology report generation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要