AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose to generate radiology reports with memory-driven Transformer, where a relational memory is used to record the information from previous generation processes and a novel layer normalization mechanism is designed to incorporate the memory into Transformer

Generating Radiology Reports via Memory driven Transformer

EMNLP 2020, pp.1439-1449, (2020)

Cited by: 0|Views226
Full Text
Bibtex
Weibo

Abstract

Medical imaging is frequently used in clinical practice and trials for diagnosis and treatment. Writing imaging reports is time-consuming and can be error-prone for inexperienced radiologists. Therefore, automatically generating radiology reports is highly desired to lighten the workload of radiologists and accordingly promote clinical au...More

Code:

Data:

0
Introduction
  • Radiology report generation, which aims to automatically generate a free-text description for a clinical radiograph, has emerged as a prominent attractive research direction in both artificial intelligence and clinical medicine.
  • It can greatly expedite the automation of workflows and improve the quality and standardization of health care.
  • There are many methods proposed
Highlights
  • Radiology report generation, which aims to automatically generate a free-text description for a clinical radiograph, has emerged as a prominent attractive research direction in both artificial intelligence and clinical medicine
  • On natural language generation (NLG) metrics, both BASE+relational memory (RM) and BASE+RM+memory-driven conditional layer normalization (MCLN) outperform the vanilla Transformer (BASE) on both datasets, which confirms the validity of incorporating memory into the decoding process in Transformer because that highly-patternized text in radiology reports are reasonably modeled to some extent
  • Our full model achieves the best performance over all baselines on different metrics, and it outperforms BASE+RM with significant improvement, which clearly indicates the usefulness of MCLN in incorporating memory rather than other ways of integration
  • On the clinical efficacy (CE) metrics on MIMIC-CXR, our full model shows the same trend as that for NLG metrics, where it outperforms all its baselines in terms of precision, recall and F1. This observation is important because higher NLG scores do not always result in higher clinical scores, so
  • We propose to generate radiology reports with memory-driven Transformer, where a relational memory is used to record the information from previous generation processes and a novel layer normalization mechanism is designed to incorporate the memory into Transformer
  • Experimental results on two benchmark datasets illustrate the effectiveness of the memory by either concatenating it with the output or integrating it with different layers of the decoder by MCLN, which obtains the state-of-the-art performance
Results
  • Results and Analyses

    4.1 Effect of Relational Memory

    To illustrate the effectiveness of the proposed method, the authors experiment with the aforementioned baselines on the two benchmark datasets.
  • On the CE metrics on MIMIC-CXR, the full model shows the same trend as that for NLG metrics, where it outperforms all its baselines in terms of precision, recall and F1.
  • This observation is important because higher NLG scores do not always result in higher clinical scores, so
Conclusion
  • The authors propose to generate radiology reports with memory-driven Transformer, where a relational memory is used to record the information from previous generation processes and a novel layer normalization mechanism is designed to incorporate the memory into Transformer.
  • Experimental results on two benchmark datasets illustrate the effectiveness of the memory by either concatenating it with the output or integrating it with different layers of the decoder by MCLN, which obtains the state-of-the-art performance.
  • Further analyses investigate how memory size affects model performance and show that the model is able to generate long reports with necessary medical terms and meaningful image-text attention mappings
Summary
  • Introduction:

    Radiology report generation, which aims to automatically generate a free-text description for a clinical radiograph, has emerged as a prominent attractive research direction in both artificial intelligence and clinical medicine.
  • It can greatly expedite the automation of workflows and improve the quality and standardization of health care.
  • There are many methods proposed
  • Results:

    Results and Analyses

    4.1 Effect of Relational Memory

    To illustrate the effectiveness of the proposed method, the authors experiment with the aforementioned baselines on the two benchmark datasets.
  • On the CE metrics on MIMIC-CXR, the full model shows the same trend as that for NLG metrics, where it outperforms all its baselines in terms of precision, recall and F1.
  • This observation is important because higher NLG scores do not always result in higher clinical scores, so
  • Conclusion:

    The authors propose to generate radiology reports with memory-driven Transformer, where a relational memory is used to record the information from previous generation processes and a novel layer normalization mechanism is designed to incorporate the memory into Transformer.
  • Experimental results on two benchmark datasets illustrate the effectiveness of the memory by either concatenating it with the output or integrating it with different layers of the decoder by MCLN, which obtains the state-of-the-art performance.
  • Further analyses investigate how memory size affects model performance and show that the model is able to generate long reports with necessary medical terms and meaningful image-text attention mappings
Tables
  • Table1: The statistics of the two benchmark datasets w.r.t. their training, validation and test sets, including the numbers of images, reports and patients, and the average word-based length (AVG. LEN.) of reports
  • Table2: The performance of all baselines and our full model on the test sets of IU X-RAY and MIMIC-CXR datasets with respect to NLG and CE metrics. BL-n denotes BLEU score using up to n-grams; MTR and RG-L denote METEOR and ROUGE-L, respectively. The average improvement over all NLG metrics compared to BASE is also presented in the “AVG. ∆” column. The performance of all models is averaged from five runs
  • Table3: Comparisons of our full model with previous studies on the test sets of IU X-RAY and MIMIC-CXR with respect to NLG and CE metrics. refers to that the result is directed cited from the original paper and represents our replicated results by their codes
  • Table4: NLG scores of our full model on the MIMICCXR test set when different memory slots are used. PARA. denotes the number of parameters
Download tables as Excel
Related work
  • The most popular related task to ours is image captioning (Vinyals et al, 2015; Xu et al, 2015; Anderson et al, 2018; Wang et al, 2019), which aims to describe images with sentences. Different from them, radiology report generation requires much longer generated outputs, and possesses other features such as patterns, so that this task has its own characteristics requiring particular solutions. For example, Jing et al (2018) proposed a co-attention

    Ground-truth: There are no old films available for comparison. The heart is moderately enlarged. There is a right ij cordis with tip in the upper svc. There is mild pulmonary vascular re-distribution but no definite infiltrates or effusion.
Funding
  • Our full model achieves the best performance over all baselines on different metrics, and it particularly outperforms BASE+RM with significant improvement, which clearly indicates the usefulness of MCLN in incorporating memory rather than other ways of integration
Study subjects and analysis
patients: 63478
• IU X-RAY (Demner-Fushman et al, 2016)4: a public radiography dataset collected by Indiana University with 7,470 chest X-ray images and 3,955 reports. • MIMIC-CXR (Johnson et al, 2019)5: the largest radiology dataset to date that consists of 473,057 chest X-ray images and 206,563 reports from 63,478 patients. For both datasets, we follow Li et al (2018) to exclude the samples without reports

Reference
  • Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6077– 6086.
    Google ScholarLocate open access versionFindings
  • Andrea Banino, Adria Puigdomenech Badia, Raphael Koster, Martin J Chadwick, Vinicius Zambaldi, Demis Hassabis, Caswell Barry, Matthew Botvinick, Dharshan Kumaran, and Charles Blundell. 2020. MEMO: A Deep Network for Flexible Combination of Episodic Memories. arXiv preprint arXiv:2001.10913.
    Findings
  • Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, and Rita Cucchiara. 2019. M2: Meshed-Memory Transformer for Image Captioning. arXiv preprint arXiv:1912.08226.
    Findings
  • Dina Demner-Fushman, Marc D Kohli, Marc B Rosenman, Sonya E Shooshan, Laritza Rodriguez, Sameer Antani, George R Thoma, and Clement J McDonald. 2016. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2):304–310.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255.
    Google ScholarLocate open access versionFindings
  • Michael Denkowski and Alon Lavie. 2011. Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems. In Proceedings of the sixth workshop on statistical machine translation, pages 85–91.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Shizhe Diao, Jiaxin Bai, Yan Song, Tong Zhang, and Yonggang Wang. 2019. ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations. arXiv preprint arXiv:1911.00720.
    Findings
  • Shizhe Diao, Yan Song, and Tong Zhang. 2020. Keyphrase Generation with Cross-Document Attention. arXiv preprint arXiv:2004.09800.
    Findings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
    Google ScholarLocate open access versionFindings
  • Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, et al. 2019. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 590–597.
    Google ScholarLocate open access versionFindings
  • Baoyu Jing, Zeya Wang, and Eric Xing. 2019.
    Google ScholarFindings
  • Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-ray Reports. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6570–6580.
    Google ScholarLocate open access versionFindings
  • Baoyu Jing, Pengtao Xie, and Eric Xing. 2018. On the Automatic Generation of Medical Imaging Reports. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2577–2586.
    Google ScholarLocate open access versionFindings
  • Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chihying Deng, Roger G Mark, and Steven Horng. 2019. MIMIC-CXR: A large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042.
    Findings
  • Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR, abs/1412.6980.
    Findings
  • Guillaume Lample, Alexandre Sablayrolles, Marc’Aurelio Ranzato, Ludovic Denoyer, and Herve Jegou. 2019. Large Memory Layers with Product Keys. In Advances in Neural Information Processing Systems, pages 8546–8557.
    Google ScholarLocate open access versionFindings
  • Christy Y Li, Xiaodan Liang, Zhiting Hu, and Eric P Xing. 2019. Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6666–6673.
    Google ScholarLocate open access versionFindings
  • Yuan Li, Xiaodan Liang, Zhiting Hu, and Eric P Xing. 2018. Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation. In Advances in neural information processing systems, pages 1530–1540.
    Google ScholarLocate open access versionFindings
  • Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out, pages 74–81.
    Google ScholarFindings
  • Guanxiong Liu, Tzu-Ming Harry Hsu, Matthew McDermott, Willie Boag, Wei-Hung Weng, Peter Szolovits, and Marzyeh Ghassemi. 2019. Clinically Accurate Chest X-Ray Report Generation. In Machine Learning for Healthcare Conference, pages 249–269.
    Google ScholarLocate open access versionFindings
  • Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 375– 383.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In Advances in neural information processing systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, and Vaibhava Goel. 2017. Self-critical Sequence Training for Image Captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7008–7024.
    Google ScholarLocate open access versionFindings
  • Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Timothy Lillicrap. 2018. Relational recurrent neural networks. In Advances in neural information processing systems, pages 7299–7310.
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556.
    Findings
  • Yan Song, Chia-Jung Lee, and Fei Xia. 2017. Learning Word Representations with Regularization from Prior Knowledge. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 143–152.
    Google ScholarLocate open access versionFindings
  • Yan Song and Shuming Shi. 2018. Complementary Learning of Word Embeddings. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 4368–4374.
    Google ScholarLocate open access versionFindings
  • Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. 2015. End-To-End Memory Networks. In Advances in neural information processing systems, pages 2440–2448.
    Google ScholarLocate open access versionFindings
  • Yuanhe Tian, Yan Song, Xiang Ao, Fei Xia, Xiaojun Quan, Tong Zhang, and Yonggang Wang. 2020a. Joint Chinese Word Segmentation and Partof-speech Tagging via Two-way Attentions of Autoanalyzed Knowledge. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8286–8296.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and Tell: A Neural Image Caption Generator. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3156–3164.
    Google ScholarLocate open access versionFindings
  • Weixuan Wang, Zhihong Chen, and Haifeng Hu. 2019. Hierarchical Attention Network for Image Captioning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 8957–8964.
    Google ScholarLocate open access versionFindings
  • Jason Weston, Sumit Chopra, and Antoine Bordes. 2015. Memory Networks. CoRR, abs/1410.3916.
    Findings
  • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015.
    Google ScholarFindings
  • Jichuan Zeng, Jing Li, Yan Song, Cuiyun Gao, Michael R Lyu, and Irwin King. 2018. Topic Memory Networks for Short Text Classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3120– 3131.
    Google ScholarLocate open access versionFindings
  • Hongming Zhang, Jiaxin Bai, Yan Song, Kun Xu, Changlong Yu, Yangqiu Song, Wilfred Ng, and Dong Yu. 2019. Multiplex Word Embeddings for Selectional Preference Acquisition. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5250–5259.
    Google ScholarLocate open access versionFindings
  • Yuanhe Tian, Yan Song, and Fei Xia. 2020b. Supertagging Combinatory Categorial Grammar with Attentive Graph Convolutional Networks. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Yuanhe Tian, Yan Song, Fei Xia, and Tong Zhang. 2020c. Improving Constituency Parsing with Span Attention. In Findings of the 2020 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Yuanhe Tian, Yan Song, Fei Xia, Tong Zhang, and Yonggang Wang. 2020d. Improving Chinese Word Segmentation with Wordhood Memory Networks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8274–8285.
    Google ScholarLocate open access versionFindings
Author
Zhihong Chen
Zhihong Chen
Yan Song
Yan Song
Tsung-Hui Chang
Tsung-Hui Chang
Xiang Wan
Xiang Wan
Your rating :
0

 

Tags
Comments
小科