Context embedding based on Bi-LSTM in semi-supervised biomedical word sense disambigua-tion

IEEE ACCESS（2019）

引用 23|浏览11

暂无评分

摘要

Word sense disambiguation (WSD) is a basic task of natural language processing (NLP) and its purpose to choose the correct sense of an ambiguous word according to its context. In biomedical WSD, recent research has used context embeddings built by concatenating or averaging word embeddings to represent the sense of a context. These simple linear operations on neighbor words ignore the information about the sequence and may cause their models to be flawed in semantic representation. In this paper, we present a novel language model based on Bi-LSTM to embed an entire sentential context in continuous space by taking account of word order. We demonstrate that our language model can generate high-quality context representations in an unsupervised manner. Unlike the previous work that directly predicts the word senses, our model classifies a word in a context by building sense embeddings and this helps us set a new state-of-the-art result (macro/micro average) on both MSH and NLM datasets. In addition, with the same language model, we propose semi-supervised learning based on label propagation (LP) to reduce the dependence on biomedical data. The results show that this method can nearly approach the state-of-the-art results produced by our Bi-LSTM when reducing the labeled training data.

查看译文

关键词

Word sense disambiguation,semi-supervised learning,context embedding,biomedical domain

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要