Do You Need Embeddings Trained on a Massive Specialized Corpus for Your Clinical Natural Language Processing Task?

Antoine Neuraz,Vincent Looten,Bastien Rance, Nicolas Daniel,Nicolas Garcelon,Leonardo Campillos Llanos,Anita Burgun,Sophie Rosset

Studies in Health Technology and Informatics（2019）

引用 8|浏览359

暂无评分

摘要

We explore the impact of data source on word representations for different NLP tasks in the clinical domain in French (natural language understanding and text classification). We compared word embeddings (Fasttext) and language models (ELMo), learned either on the general domain (Wikipedia) or on specialized data (electronic health records, EHR). The best results were obtained with ELMo representations learned on EHR data for one of the two tasks(+ 7% and +8% of gain in F1-score).

查看译文

关键词

Natural language processing,electronic health records

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要