Prequery Discovery of Domain-Specific Query Forms: A Survey

IEEE Transactions on Knowledge and Data Engineering(2013)

引用 20|浏览4
暂无评分
摘要
The discovery of HTML query forms is one of the main challenges in Deep Web crawling. Automatic solutions for this problem perform two main tasks. The first is locating HTML forms on the Web, which is done through the use of traditional/focused crawlers. The second is identifying which of these forms are indeed meant for querying, which also typically involves determining a domain for the underlying data source (and thus for the form as well). This problem has attracted a great deal of interest, resulting in a long list of algorithms and techniques. Some methods submit requests through the forms and then analyze the data retrieved in response, typically requiring a great deal of knowledge about the domain as well as semantic processing. Others do not employ form submission, to avoid such difficulties, although some techniques rely to some extent on semantics and domain knowledge. This survey gives an up-to-date review of methods for the discovery of domain-specific query forms that do not involve form submission. We detail these methods and discuss how form discovery has become increasingly more automated over time. We conclude with a forecast of what we believe are the immediate next steps in this trend.
更多
查看译文
关键词
Web sites,hypermedia markup languages,query processing,HTML query form discovery,Web crawling,data source,domain knowledge,domain-specific query form prequery discovery,retrieved data analysis,semantic processing,Deep web,domain-specific search,hidden web,query form discovery
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要