Word Embeddings For Biomedical Natural Language Processing: A Survey

LANGUAGE AND LINGUISTICS COMPASS（2020）

引用 23|浏览352

暂无评分

摘要

Word representations are mathematical objects that capture the semantic and syntactic properties of words in a way that is interpretable by machines. Recently, encoding word properties into low-dimensional vector spaces using neural networks has become increasingly popular. Word embeddings are now used as the main input to natural language processing (NLP) applications, achieving cutting-edge results. Nevertheless, most word-embedding studies are carried out with general-domain text and evaluation datasets, and their results do not necessarily apply to text from other domains (e.g., biomedicine) that are linguistically distinct from general English. To achieve maximum benefit when using word embeddings for biomedical NLP tasks, they need to be induced and evaluated using in-domain resources. Thus, it is essential to create a detailed review of biomedical embeddings that can be used as a reference for researchers to train in-domain models. In this paper, we review biomedical word embedding studies from three key aspects: the corpora, models and evaluation methods. We first describe the characteristics of various biomedical corpora, and then compare popular embedding models. After that, we discuss different evaluation methods for biomedical embeddings. For each aspect, we summarize the various challenges discussed in the literature. Finally, we conclude the paper by proposing future directions that will help advance research into biomedical embeddings.

查看译文

关键词

biomedical NLP, evaluation, word embeddings

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要