Learning dictionaries for information extraction by multi-level bootstrapping

AAAI/IAAI(1999)

引用 1212|浏览553
暂无评分
摘要
Information extraction systems usually require two dictionaries: a semantic lexicon and a dictionary of extraction patterns for the domain. We present a multilevel bootstrapping algorithm that generates both the semantic lexicon and extraction patterns simultaneously. As input, our technique requires only unannotated training texts and a handful of seed words for a category. We use a mutual bootstrapping technique to alternately select the best extraction pattern for the category and bootstrap its extractions into the semantic lexicon, which is the basis for selecting the next extraction pattern. To make this approach more robust, we add a second level of bootstrapping (metabootstrapping) that retains only the most reliable lexicon entries produced by mutual bootstrapping and then restarts the process. We evaluated this multilevel bootstrapping technique on a collection of corporate web pages and a corpus of terrorism news articles. The algorithm produced high-quality dictionaries for several semantic categories.
更多
查看译文
关键词
mutual bootstrapping technique,multilevel bootstrapping technique,mutual bootstrapping,information extraction system,best extraction pattern,multi-level bootstrapping,reliable lexicon entry,next extraction pattern,extraction pattern,semantic lexicon,multilevel bootstrapping algorithm,web pages,information extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要