Integrating Extra Knowledge Into Word Embedding Models For Biomedical Nlp Tasks
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)(2017)
摘要
Word embedding in the NLP area has attracted increasing attention in recent years. The continuous bag-of-words model (CBOW) and the continuous Skip-gram model (Skip-gram) have been developed to learn distributed representations of words from a large amount of unlabeled text data. In this paper, we explore the idea of integrating extra knowledge to the CBOW and Skip-gram models and applying the new models to biomedical NLP tasks. The main idea is to construct a weighted graph from knowledge bases (KBs) to represent structured relationships among words/concepts. In particular, we propose a GCBOW model and a GSkip-gram model respectively by integrating such a graph into the original CBOW model and Skip-gram model via graph regularization. Our experiments on four general domain standard datasets show encouraging improvements with the new models. Further evaluations on two biomedical NLP tasks (biomedical similarity/relatedness task and biomedical Information Retrieval (IR) task) show that our methods have better performance than baselines.
更多查看译文
关键词
knowledge integration,word embedding models,natural language processing,biomedical NLP,continuous bag-of-words model,CBOW,continuous Skip-gram model,weighted graph,knowledge bases,KB,information retrieval,IR
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要