Medical Corpora Comparison Using Topic Modeling

Alevtina A. Shaikina,Anastasia A. Funkner

Procedia Computer Science(2020)

引用 2|浏览0
暂无评分
摘要
Free-form texts from electronic medical records are often used to build predictive models for medical and healthcare processes. In different medical centers, treatment of patients and other healthcare processes can occur in different ways according to the hospital's internal protocols, which affects the structure of electronic medical records and the style of free-form texts. The paper aim is to compare two medical corpora in content to understand whether trained models of the first corpora apply to the second corpora. The approach contains topic modeling, topic segmentation, topic cross-segmentation and specific metric Topic Segmentation Collation to compare cross-segmentation results. Also, the results of the word-level analysis for both corpora are provided. We conclude each of the corpora needs different word-level processing and has a specific set of descriptions, which limits the use of predictive models for some diseases. (c) 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of the 9th International Young Scientist Conference on Computational Science
更多
查看译文
关键词
clinical texts, medical records, medical corpus, topic modeling, topic segmentation, natural language processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要