Classification-Based Adaptive Web Scraper

Ujwal B. V. S,Bharat Gaind,Abhishek Kundu,Anusha Holla,Mukund Rungta

2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)（2017）

Cited 7|Views5

No score

Abstract

Web scraping is an important problem in computer science. The problem with the commonly-used position or structure-based web scraping tools is that they need to be manually reconfigured as soon as the structure of the web page changes. In this paper, we try to solve this problem of information extraction for web pages consisting of repetitive blocks. We extract these blocks and their constituent attributes, using a novel classification-based approach. Our approach gives high accuracy when used to extract product-offers from an offer-aggregator website. It is also highly adaptive to the changing structure of a website.

Translated text

Key words

Web Scraping,Information Extraction,Machine Learning

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined