Classification-Based Adaptive Web Scraper

2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)(2017)

引用 7|浏览1
暂无评分
摘要
Web scraping is an important problem in computer science. The problem with the commonly-used position or structure-based web scraping tools is that they need to be manually reconfigured as soon as the structure of the web page changes. In this paper, we try to solve this problem of information extraction for web pages consisting of repetitive blocks. We extract these blocks and their constituent attributes, using a novel classification-based approach. Our approach gives high accuracy when used to extract product-offers from an offer-aggregator website. It is also highly adaptive to the changing structure of a website.
更多
查看译文
关键词
Web Scraping,Information Extraction,Machine Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要