Information Extraction from Swedish Medical Prescriptions with Sig-Transformer Encoder

John Pougue Biyong
John Pougue Biyong
Bo Wang
Bo Wang
Terry Lyons
Terry Lyons

ClinicalNLP@EMNLP, pp. 41-54, 2020.

EI
Other Links: arxiv.org|dblp.uni-trier.de
Weibo:
One of our proposed models, namely M-Bidirectional Encoder Representations from Transformers+Sig-Transformer Encoder, reported the best performance for quantity and quantity tag, all models failed to perform on the indication task which we provided possible explanations

Abstract:

Relying on large pretrained language models such as Bidirectional Encoder Representations from Transformers (BERT) for encoding and adding a simple prediction layer has led to impressive performance in many clinical natural language processing (NLP) tasks. In this work, we present a novel extension to the Transformer architecture, by in...More

Code:

Data:

0
Introduction
  • Medical prescription notes written by clinicians about patients contains valuable information that the structured part of electronic health records (EHRs) does not have.
  • More recently with the advent of Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al, 2018), fine-tuning of general-domain language models has been widely adopted for many clinical NLP tasks2.
  • Many NLP applications in the medical domain can be formulated as either token classification, sequence classification or sequence regression, in which a pretrained language model such as clinicalBERT can be used to encode the input token sequence and a task-specific prediction model is added on top to generate the final output (Gu et al, 2020).
  • The weights are computed by the dot products of the query and all keys, scaled and applied a softmax function
Highlights
  • Medical prescription notes written by clinicians about patients contains valuable information that the structured part of electronic health records (EHRs) does not have
  • Many natural language processing (NLP) applications in the medical domain can be formulated as either token classification, sequence classification or sequence regression, in which a pretrained language model such as clinicalBERT can be used to encode the input token sequence and a task-specific prediction model is added on top to generate the final output (Gu et al, 2020)
  • Considering our data is in Swedish we explore two different approaches for encoding the prescription notes: (1), Apply Multilingual Bidirectional Encoder Representations from Transformers (BERT) (M-BERT) (Devlin et al, 2018) directly to the Swedish text; (2), Translate the prescriptions to English as described in Section 4.1, and encode the translated text with ClinicalBERT (Huang et al, 2020)
  • We aimed to automatically extract information related to quantity, quantity tag and the indication label from a Swedish medical prescription dataset
  • One of our proposed models, namely M-BERT+Sig-Transformer Encoder (STE), reported the best performance for quantity and quantity tag, all models failed to perform on the indication task which we provided possible explanations
  • A further investigation to better understand the contribution of signature transform and the interplay between signature and attentions would be very insightful
Methods
  • As described in Section 2, BERT and its variants have been used in a range of clinical machine learning tasks.
  • ClinicalBERT is fine-tuned on clinical notes, it has embedded more domain knowledge and has proven to be more effective over BERT-base on several clinical tasks.
  • Any translation error would affect the performance of subsequent models.
  • The authors use the PyTorch implementation of Transformers by Hugging Face5 for loading M-BERT and ClinicalBERT, and Signatory6 (Kidger and Lyons, 2020) for differentiable computations of.
  • The authors think the potenwork for QUANTITY and INDICATION respectively. tial discrepancies in the translation of the Swedish
Results
  • Results form well in this class

    The overall best performing model, M-BERT+STE, performs consistently

    The results for all three tasks are summarised in well across different classes of QUANTITY TAG.

    Among the three models without the use of LSTM or STE, both ClinicalBERT and Multilingual BERT (M-BERT) outperform the BERT-base model in all three tasks.
  • The results for all three tasks are summarised in well across different classes of QUANTITY TAG.
  • Among the three models without the use of LSTM or STE, both ClinicalBERT and Multilingual BERT (M-BERT) outperform the BERT-base model in all three tasks.
  • The class imbalance is remarkable for QUANTITY TAG and INDICATION in this dataset, as described in Section 4.
  • The strong results obtained from many models for QUANTITY TAG suggests albeit the class imbalance the boundaries between its classes are distinguishable and can be learnt from fewer examples.
  • The new classification becomes much less distinguishable and the new classes are less distinct as each class contains
Conclusion
  • Conclusions and Future Work

    In this work, the authors propose a new extension to the Transformer architecture, named Sig-Transformer

    Encoder (STE), by incorporating signature transform with the self-attention mechanism.
  • The authors aimed to automatically extract information related to quantity, quantity tag and the indication label from a Swedish medical prescription dataset.
  • One of the proposed models, namely M-BERT+STE, reported the best performance for quantity and quantity tag, all models failed to perform on the indication task which the authors provided possible explanations.
  • The authors plan to apply the proposed STE models to the much larger prescription database and investigate ways to improve the labelling for indication.
  • A further investigation to better understand the contribution of signature transform and the interplay between signature and attentions would be very insightful
Summary
  • Introduction:

    Medical prescription notes written by clinicians about patients contains valuable information that the structured part of electronic health records (EHRs) does not have.
  • More recently with the advent of Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al, 2018), fine-tuning of general-domain language models has been widely adopted for many clinical NLP tasks2.
  • Many NLP applications in the medical domain can be formulated as either token classification, sequence classification or sequence regression, in which a pretrained language model such as clinicalBERT can be used to encode the input token sequence and a task-specific prediction model is added on top to generate the final output (Gu et al, 2020).
  • The weights are computed by the dot products of the query and all keys, scaled and applied a softmax function
  • Objectives:

    The authors' aim is to develop a machine learning-based system that extracts relevant information from the Swedish medical prescription notes1, namely quantity, quantity tag and indication.
  • The last three columns are the labels of interest which the authors aim to extract automatically.
  • The authors' objective is to automatically find the QUANTITY of prescribed medicine as well as QUANTITY TAG and INDICATION of the prescription, for each prescription note.
  • The authors aimed to automatically extract information related to quantity, quantity tag and the indication label from a Swedish medical prescription dataset
  • Methods:

    As described in Section 2, BERT and its variants have been used in a range of clinical machine learning tasks.
  • ClinicalBERT is fine-tuned on clinical notes, it has embedded more domain knowledge and has proven to be more effective over BERT-base on several clinical tasks.
  • Any translation error would affect the performance of subsequent models.
  • The authors use the PyTorch implementation of Transformers by Hugging Face5 for loading M-BERT and ClinicalBERT, and Signatory6 (Kidger and Lyons, 2020) for differentiable computations of.
  • The authors think the potenwork for QUANTITY and INDICATION respectively. tial discrepancies in the translation of the Swedish
  • Results:

    Results form well in this class

    The overall best performing model, M-BERT+STE, performs consistently

    The results for all three tasks are summarised in well across different classes of QUANTITY TAG.

    Among the three models without the use of LSTM or STE, both ClinicalBERT and Multilingual BERT (M-BERT) outperform the BERT-base model in all three tasks.
  • The results for all three tasks are summarised in well across different classes of QUANTITY TAG.
  • Among the three models without the use of LSTM or STE, both ClinicalBERT and Multilingual BERT (M-BERT) outperform the BERT-base model in all three tasks.
  • The class imbalance is remarkable for QUANTITY TAG and INDICATION in this dataset, as described in Section 4.
  • The strong results obtained from many models for QUANTITY TAG suggests albeit the class imbalance the boundaries between its classes are distinguishable and can be learnt from fewer examples.
  • The new classification becomes much less distinguishable and the new classes are less distinct as each class contains
  • Conclusion:

    Conclusions and Future Work

    In this work, the authors propose a new extension to the Transformer architecture, named Sig-Transformer

    Encoder (STE), by incorporating signature transform with the self-attention mechanism.
  • The authors aimed to automatically extract information related to quantity, quantity tag and the indication label from a Swedish medical prescription dataset.
  • One of the proposed models, namely M-BERT+STE, reported the best performance for quantity and quantity tag, all models failed to perform on the indication task which the authors provided possible explanations.
  • The authors plan to apply the proposed STE models to the much larger prescription database and investigate ways to improve the labelling for indication.
  • A further investigation to better understand the contribution of signature transform and the interplay between signature and attentions would be very insightful
Tables
  • Table1: Example prescriptions with translations and annotations. The second column is the English translation obtained from Google Translate API. The last three columns are the labels of interest which we aim to extract automatically. Some longer example prescriptions can be viewed in Table 8
  • Table2: Class distribution for QUANTITY TAG and INDICATION. APPP is an abbreviation for As Per Previous Prescription, NS stands for Not Specified and NA stands for Not Annotated
  • Table3: We first notice all models have failed to recognise the INDICATION classes, which we will 6.3 The difficult case of Indication discuss in Section 6.3. In this section we mainly discuss the performance for QUANTITY and QUANTITY TAG. Performance comparison between our proposed STE-based approaches and baseline models. The QUANTITY task is measured in mean squared error (MSE), while for QUANTITY TAG and INDICATION we use Macro F-1 score. Base refers the original BERT-base model, M-BERT is the Multilingual BERT model pretrained on Swedish text data. STE refers to Sig-Transformer Encoder
  • Table4: Ablation study of the best performing MBERT + STE model, removing positional encoding (PE) or signature transform (ST)
  • Table5: The number of dimensions (dsig) of the truncated signature is determined by the size of its input (dpresig) and the order of truncation selected (ordersig)
  • Table6: Model performance comparison for QUANTITY and also across different classes in QUANTITY TAG. APPP: As Per Previous Prescription; NS: Not Specified
  • Table7: Model performance comparison across different classes in INDICATION. NA: Not Annotated
  • Table8: Longer example prescriptions with translations and annotations
Download tables as Excel
Funding
  • This work was supported by the MRC Mental Health Data Pathfinder award to the University of Oxford [MC PC 17215], by the NIHR Oxford Health Biomedical Research Centre and by the The Alan Turing Institute under the EPSRC grant EP/N510129/1
Reference
  • Emily Alsentzer, John Murphy, William Boag, WeiHung Weng, Di Jindi, Tristan Naumann, and Matthew McDermott. 2019. Publicly available clinical bert embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72–78.
    Google ScholarLocate open access versionFindings
  • Imanol Perez Arribas, Guy M Goodwin, John R Geddes, Terry Lyons, and Kate EA Saunders. 2018. A signature-based machine learning model for distinguishing bipolar disorder and borderline personality disorder. Translational psychiatry, 8(1):1–7.
    Google ScholarLocate open access versionFindings
  • Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450.
    Findings
  • Juan M Banda, Martin Seneviratne, Tina HernandezBoussard, and Nigam H Shah. 2018. Advances in electronic phenotyping: from rule-based definitions to machine learning models. Annual review of biomedical data science, 1:53–68.
    Google ScholarLocate open access versionFindings
  • Ilya Chevyrev and Andrey Kormilitzin. 2016. A primer on the signature method in machine learning. arXiv preprint arXiv:1603.03788.
    Findings
  • Thomas Davenport and Ravi Kalakota. 2019. The potential for artificial intelligence in healthcare. Future healthcare journal, 6(2):94.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2020. Domainspecific language model pretraining for biomedical natural language processing. arXiv preprint arXiv:2007.15779.
    Findings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770– 778.
    Google ScholarLocate open access versionFindings
  • Kexin Huang, Jaan Altosaar, and Rajesh Ranganath. 2020. Clinicalbert: Modeling clinical notes and predicting hospital readmission. In Proc. ACM Conference on Health, Inference, and Learning (CHIL).
    Google ScholarLocate open access versionFindings
  • Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-Wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016.
    Google ScholarFindings
  • Katikapalli Subramanyam Kalyan and S Sangeetha. 2020. Secnlp: A survey of embeddings in clinical natural language processing. Journal of biomedical informatics, 101:103323.
    Google ScholarLocate open access versionFindings
  • Patrick Kidger, Patric Bonnier, Imanol Perez Arribas, Cristopher Salvi, and Terry Lyons. 2019. Deep signature transforms. In Advances in Neural Information Processing Systems, pages 3105–3115.
    Google ScholarLocate open access versionFindings
  • Patrick Kidger and Terry Lyons. 2020. Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU. arXiv:2001.00706.
    Findings
  • Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
    Google ScholarLocate open access versionFindings
  • Terry Lyons. 2014. Rough paths, signatures and the modelling of functions on streams. In Proceedings of the International Congress of Mathematicians, pages 163–184.
    Google ScholarLocate open access versionFindings
  • Terry J Lyons. 1998. Differential equations driven by rough signals. Revista Matematica Iberoamericana, 14(2):215–310.
    Google ScholarLocate open access versionFindings
  • James H Morrill, Andrey Kormilitzin, Alejo J NevadoHolgado, Sumanth Swaminathan, Samuel D Howison, and Terry J Lyons. 2020. Utilization of the signature method to identify the early onset of sepsis from multivariate physiological time series in critical care monitoring. Critical Care Medicine, 48(10):e976–e981.
    Google ScholarLocate open access versionFindings
  • Yifan Peng, Shankai Yan, and Zhiyong Lu. 20Transfer learning in biomedical natural language processing: An evaluation of bert and elmo on ten benchmarking datasets. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 58–65.
    Google ScholarLocate open access versionFindings
  • Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of NAACL-HLT, pages 2227–2237.
    Google ScholarLocate open access versionFindings
  • Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 464–468.
    Google ScholarLocate open access versionFindings
  • Yuqi Si, Jingqi Wang, Hua Xu, and Kirk Roberts. 2019. Enhancing clinical concept extraction with contextual embeddings. Journal of the American Medical Informatics Association, 26(11):1297–1304.
    Google ScholarLocate open access versionFindings
  • Csaba Toth and Harald Oberhauser. 2020. Bayesian learning from sequential data using gaussian processes with signature covariances. In Proceedings of the International Conference on Machine Learning (ICML).
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Bo Wang, Maria Liakata, Hao Ni, Terry Lyons, Alejo J Nevado-Holgado, and Kate Saunders. 2019. A path signature approach for speech emotion recognition. In Interspeech 2019, pages 1661–1665. ISCA.
    Google ScholarLocate open access versionFindings
  • Bo Wang, Yue Wu, Niall Taylor, Terry Lyons, Maria Liakata, Alejo J Nevado-Holgado, and Kate EA Saunders. 2020. Learning to detect bipolar disorder and borderline personality disorder with language and speech in non-clinical interviews. In Interspeech 2020. ISCA.
    Google ScholarFindings
  • Yanshan Wang, Liwei Wang, Majid Rastegar-Mojarad, Sungrim Moon, Feichen Shen, Naveed Afzal, Sijia Liu, Yuqun Zeng, Saeed Mehrabi, Sunghwan Sohn, et al. 2018. Clinical information extraction applications: a literature review. Journal of biomedical informatics, 77:34–49.
    Google ScholarLocate open access versionFindings
  • Zecheng Xie, Zenghui Sun, Lianwen Jin, Hao Ni, and Terry Lyons. 2018. Learning spatial-semantic context with fully convolutional recurrent network for online handwritten chinese text recognition. IEEE transactions on pattern analysis and machine intelligence, 40(8):1903–1917.
    Google ScholarLocate open access versionFindings
  • Weixin Yang, Lianwen Jin, and Manfei Liu. 2016. Deepwriterid: An end-to-end online text-independent writer identification system. IEEE Intelligent Systems, 31(2):45–53.
    Google ScholarLocate open access versionFindings
  • Weixin Yang, Terry Lyons, Hao Ni, Cordelia Schmid, Lianwen Jin, and Jiawei Chang. 2017. Leveraging the path signature for skeleton-based human action recognition. arXiv preprint arXiv:1707.03993.
    Findings
Full Text
Your rating :
0

 

Tags
Comments