Information Extraction using Non-consecutive Word Sequences

Sachindra Joshi,Ganesh Ramakrishnan,Sreeram Balakrishnan,Ashwin Srinivasan

msra（2006）

引用 23|浏览19

暂无评分

摘要

We address an important deficiency in existing machine learning approaches for in- formation extraction from natural language texts. Existing techniques for information extraction employ rules that exploit properties of consecutive word sequences. We argue that sequences of non-consecutive words capturing long range contextual correlations are vital features for informa- tion extraction from natural language text. We propose an efficient method that extends the a-priori algorithm to mine frequently occurring non-consecutive word sequences from a given corpus. We also perform a simplistic aggregation of feature information across multiple mentions of an en- tity in a document to avoid independent classification of the multiple occurrences of the entity. Experiments on some standard data sets show substantial improvements over previously reported results.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要