Integrating data from the web by machine-learning tree-pattern queries

ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2006: COOPIS, DOA, GADA, AND ODBAS, PT 1, PROCEEDINGS(2006)

引用 4|浏览0
暂无评分
摘要
Effienct and reliable integration of web data requires building programs called wrappers Hand writting wrappers is tedious and error prone Constant changes in the web, also implies that wrappers need to be constantly refactored Machine learning has proven to be useful, but current techniques are either limited in expressivity, require non-intuitive user interaction or do not allow for n-ary extraction We study using tree-patterns as an n-ary extraction language and propose an algorithm learning such queries It calculates the most information-conservative tree-pattern which is a generalization of two input trees A notable aspect is that the approach allows to learn queries containing both child and descendant relationships between nodes More importantly, the proposed approach does not require any labeling other than the data which the user effectively wants to extract The experiments reported show the effectiveness of the approach.
更多
查看译文
关键词
web data,error prone constant change,n-ary extraction language,current technique,information-conservative tree-pattern,integrating data,non-intuitive user interaction,tree-pattern query,refactored machine learning,descendant relationship,n-ary extraction,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要