Refining Information Extraction Rules using Data Provenance

Bin Liu,Laura Chiticariu,Vivian Chu,H. V. Jagadish,Frederick R. Reiss

IEEE Data Eng. Bull.（2010）

引用 27|浏览51

暂无评分

摘要

Developing high-quality information extraction (IE) rules, or extractors, isan iterative and primarily manual process, extremely time consuming, and error prone. In eac h iteration, the outputs of the ex- tractor are examined, and the erroneous ones are used to drive the refi nement of the extractor in the next iteration. Data provenance explains the origins of an output data, an d how it has been transformed through a query. As such, one can expect data provenance to be valu able in understanding and debug- ging complex IE rules. In this paper we discuss how data provenance ca n be used beyond understanding and debugging, to automatically refine IE rules. In particular, we overvie w the main ideas behind a recent provenance-based solution for suggesting a ranked list of refi nements to an extractor aimed at increasing its precision, and outline several related directions for future re search.

查看译文

关键词

information extraction

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要