Noise Pollution in Hospital Readmission Prediction: Long Document Classification with Reinforcement Learning

BioNLP, pp. 95-104, 2020.

Cited by: 0|Views16
EI
Weibo:
We address the task of 30-day readmission prediction after kidney transplant, and propose to improve the performance by applying reinforcement learning with noise extraction capability

Abstract:

This paper presents a reinforcement learning approach to extract noise in long clinical documents for the task of readmission prediction after kidney transplant. We face the challenges of developing robust models on a small dataset where each document may consist of over 10K tokens with full of noise including tabular text and task-irre...More

Code:

Data:

Full Text
Bibtex
Weibo
Introduction
  • Prediction of hospital readmission has always been recognized as an important topic in surgery.
  • It is of significance to develop prediction models utilizing various sources of unstructured clinical documents.
  • The task addressed in this paper is to predict 30day hospital readmission after kidney transplant, which the authors treat it as a long document classification problem without using specific domain knowledge.
  • The data the authors use is the unstructured clinical documents of each patient up to the date of discharge
Highlights
  • Prediction of hospital readmission has always been recognized as an important topic in surgery
  • The task addressed in this paper is to predict 30day hospital readmission after kidney transplant, which we treat it as a long document classification problem without using specific domain knowledge
  • Tuning Analysis We find that two hyperparameters are essential to the final success of reinforcement learning (RL)
  • Quantitative Analysis We examine tokens that are pruned by reinforcement learning and compare with document frequency (DF) cutoff
  • We address the task of 30-day readmission prediction after kidney transplant, and propose to improve the performance by applying reinforcement learning with noise extraction capability
  • Empirical results show that bagof-words is the most suitable encoder, surpassing overfitted deep learning models, and reinforcement learning is able to improve the performance, while being able to identify both traditional noisy tokens that appear in few documents, and task-specific noisy text that commonly appear
Methods
  • The authors perform the preprocessing described in Section 2.1, and randomly split patients in every note type by 5 folds to perform 6.1 Baseline

    Bag-of-Words The authors first conduct experiments using the bag-of-words encoder (BoW; Section 4.1) to establish the baseline.
  • The authors perform the preprocessing described in Section 2.1, and randomly split patients in every note type by 5 folds to perform 6.1 Baseline.
  • Many experiments are performed on all note types using the vanilla TF-IDF, document frequency (DF) cutoff at 2, and token stemming.
  • Table 3 describes the cross-validation results on every note type.
  • Some note types are not as predictive as the others, such as Operative (OP) and Social Worker (SW), with the AUC under 52%.
  • Most note types have the standard deviations in range 0.02 to 0.03
Results
  • The authors' reinforcement learning approach outperforms the best performing models in Table 3, achieving around 1% higher AUC scores on three note types, CO, HP, and SC, while pruning out up to 26% of the input documents.
Conclusion
  • The authors address the task of 30-day readmission prediction after kidney transplant, and propose to improve the performance by applying reinforcement learning with noise extraction capability.
  • Empirical results show that bagof-words is the most suitable encoder, surpassing overfitted deep learning models, and reinforcement learning is able to improve the performance, while being able to identify both traditional noisy tokens that appear in few documents, and task-specific noisy text that commonly appear
Summary
  • Introduction:

    Prediction of hospital readmission has always been recognized as an important topic in surgery.
  • It is of significance to develop prediction models utilizing various sources of unstructured clinical documents.
  • The task addressed in this paper is to predict 30day hospital readmission after kidney transplant, which the authors treat it as a long document classification problem without using specific domain knowledge.
  • The data the authors use is the unstructured clinical documents of each patient up to the date of discharge
  • Methods:

    The authors perform the preprocessing described in Section 2.1, and randomly split patients in every note type by 5 folds to perform 6.1 Baseline

    Bag-of-Words The authors first conduct experiments using the bag-of-words encoder (BoW; Section 4.1) to establish the baseline.
  • The authors perform the preprocessing described in Section 2.1, and randomly split patients in every note type by 5 folds to perform 6.1 Baseline.
  • Many experiments are performed on all note types using the vanilla TF-IDF, document frequency (DF) cutoff at 2, and token stemming.
  • Table 3 describes the cross-validation results on every note type.
  • Some note types are not as predictive as the others, such as Operative (OP) and Social Worker (SW), with the AUC under 52%.
  • Most note types have the standard deviations in range 0.02 to 0.03
  • Results:

    The authors' reinforcement learning approach outperforms the best performing models in Table 3, achieving around 1% higher AUC scores on three note types, CO, HP, and SC, while pruning out up to 26% of the input documents.
  • Conclusion:

    The authors address the task of 30-day readmission prediction after kidney transplant, and propose to improve the performance by applying reinforcement learning with noise extraction capability.
  • Empirical results show that bagof-words is the most suitable encoder, surpassing overfitted deep learning models, and reinforcement learning is able to improve the performance, while being able to identify both traditional noisy tokens that appear in few documents, and task-specific noisy text that commonly appear
Tables
  • Table1: Statistics of our dataset with respect to different types of clinical notes. P: # of patients, T: avg. # of tokens, CO: Consultations, DS: Discharge Summary, EC: Echocardiography, HP: History and Physical, OP: Operative, PG: Progress, SC: Selection Conference, SW: Social Worker. The report for SC is written by the committee that consists of surgeons, nephrologists, transplant coordinators, social workers, etc. at the end of the transplant evaluation. All 8 types follow the approximately 3:7 positive-negative class distribution
  • Table2: An example of tabular text in EKTD
  • Table3: The Area Under the Curve (AUC) scores achieved by different encoders on the 5-fold cross-validation. See the caption in Table 1 for the descriptions of CO, DS, EC, HP, OP, PG, SC, and SW. For deep learning encoders, only four types are selected in experiments (Section 6.2)
  • Table4: The dimensions of the feature spaces used by each BoW model with respect to the four note types. The numbers in the parentheses indicate the percentage reduction from the vanilla model, respectively
  • Table5: SEN: maximum segment length (number of tokens) allowed by the corresponding model, SEQ: average sequence length (number of segments), INST: average number of samples in the training set
  • Table6: The AUC scores and the pruning ratios of reinforcement learning (RL). Best: AUC scores from the best performing models in Table 3
  • Table7: Examples of pruned segments by the learned policy. Tokens that have feature importance lower than −0.001 (towards Prune action) are marked bold
  • Table8: Examples of kept segments by the learned policy. Tokens that have feature importance greater than 0.0005 (towards Keep action) are marked bold
Download tables as Excel
Related work
  • Shin et al (2019) presented ensemble models utilizing both the structured and the unstructured data in EKTD, where separate logistic regression (LR) models are trained on the structured data and each type of notes respectively, and the final prediction of each patient is obtained by averaging predictions from each models. Since some patients may lack documents from certain note types, prediction on these note types are simply ignored in the averaging process. For the unstructured notes, concatenation of Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Dirichlet Allocation (LDA) representation is fed into LR. However, we have found that the representation from LDA only contributes marginally, while LDA takes significantly more inferring time. Thus, we drop LDA and only use TF-IDF as our BoW encoder (Section 4.1).

    Various deep learning models regarding text classification have been proposed in recent years. Pretrained language models like BERT have shown state-of-the-art performance on many NLP tasks (Devlin et al, 2019). ClinicalBERT is also introduced on the medical domain (Huang et al, 2019). However, deep learning approaches have two drawbacks on this particular dataset. First, deep learning requires large dataset to train, whereas most of our unstructured note types only have fewer than 2,000 samples. Second, these approaches are not designed for long documents, and difficult to keep long-term dependencies over thousands of tokens.
Funding
  • We gratefully acknowledge the support of the National Institutes of Health grant R01MD011682, Reducing Disparities among Kidney Transplant Recipients
Reference
  • Ashutosh Adhikari, Achyudh Ram, Raphael Tang, and Jimmy Lin. 2019. Rethinking complex neural network architectures for document classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4046–4051, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Senjuti Basu Roy, Ankur Teredesai, Kiyana Zolfaghar, Rui Liu, David Hazel, Stacey Newman, and Albert Marinez. 2015. Dynamic hierarchical classification for patient risk-of-readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD âA Z 15, page 1691âA S 1700, New York, NY, USA. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kexin Huang, Jaan Altosaar, and Rajesh Ranganath. 2019. Clinicalbert: Modeling clinical notes and predicting hospital readmission. CoRR, abs/1904.05342.
    Findings
  • Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016.
    Google ScholarFindings
  • Caroline Jones, Robert Hollis, Tyler Wahl, Brad Oriel, Kamal Itani, Melanie Morris, and Mary Hawn. 2016. Transitional care interventions and hospital readmissions in surgical populations: A systematic review. The American Journal of Surgery, 212.
    Google ScholarLocate open access versionFindings
  • Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, Doha, Qatar. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Stephen Merity, Nitish Shirish Keskar, and Richard Socher. 2018. Regularizing and optimizing LSTM language models. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1928– 1937, New York, New York, USA. PMLR.
    Google ScholarLocate open access versionFindings
  • Joel Nothman, Hanmin Qin, and Roman Yurchak. 2018. Stop word lists in free open-source software packages. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS), pages 7–12, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Pengda Qin, Weiran Xu, and William Yang Wang. 2018. Robust distant supervision relation extraction via deep reinforcement learning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2137–2147, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bonggun Shin, Julien Hogan, Andrew B. Adams, Raymond J. Lynch, and Jinho D. Choi. 2019. Multimodal ensemble approach to incorporate various types of clinical notes for predicting readmission. In 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).
    Google ScholarLocate open access versionFindings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 20Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958.
    Google ScholarLocate open access versionFindings
  • Li Wan, Matthew Zeiler, Sixin Zhang, Yann LeCun, and Rob Fergus. 2013. Regularization of neural networks using dropconnect. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICMLâA Z 13, page IIIâA S 1058âA S IIIâA S 1066. JMLR.org.
    Google ScholarLocate open access versionFindings
  • Ronald J. Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8(3âA S 4):229âA S 256.
    Google ScholarLocate open access versionFindings
  • Tianyang Zhang, Minlie Huang, and Li Zhao. 2018. Learning structured representation for text classification via reinforcement learning. In AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments