Domain relation extraction from noisy Chinese texts

Neurocomputing(2020)

引用 6|浏览64
暂无评分
摘要
Relation extraction is a typical method to extend the knowledge graph (KG); nevertheless, when it is applied in a particular domain, the issue of text sparsity becomes noteworthy. As a remedy, distant supervision is introduced to mitigate the issue, which, however, brings about noise simultaneously. The two issues have been notorious in Chinese domain, since Chinese KGs are less developed in comparison with popular languages like English. To tackle the challenge, we propose a complementary convolution neural network (com-CNN) with attentional multiple instance learning (MIL) to obtain highly comprehensive features for extracting relations, and alleviate the negative effect caused by sentence-level noise. Our model com-CNN fully captures information from two different representations of a relation instance, raw word sequence (RWS) and multiple dependency path (MDP), and enables them to complement each other. To achieve better combination of RWS and MDP, we design a flexible feature fusion method. To mitigate the over-fitting problem of attention mechanism, which is caused by sparse texts, entity information is employed to guide the computation of attention scores for multiple instances in a bag to alleviate the impact of the wrongly labelled data. Experiments on Chinese relation extraction show that our proposal outperforms state-of-the-art approaches, and that the combination of RWS and MDP can generate more representative features for relation extraction. Besides, the empirical results validate our intention that entity-integrated attentional MIL offers the best performance in denoising for sparse domain texts compared with alternatives.
更多
查看译文
关键词
Relation extraction,Text sparsity,Distant supervision,Attentional MIL
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要