Graph Neural Networks for Nomination and Representation Learning of Web Elements

Alexandra Hotti,Riccardo Sven Risuleo,Stefan Magureanu, Aref Moradi,Jens Lagergren

arxiv（2022）

引用 0|浏览1

暂无评分

摘要

This paper tackles the under-explored problem of DOM element nomination and representation learning with three important contributions. First, we present a large-scale and realistic dataset of webpages, far richer and more diverse than other datasets proposed for element representation learning, classification and nomination on the web. The dataset contains $51,701$ manually labeled product pages from $8,175$ real e-commerce websites. Second, we adapt several Graph Neural Network (GNN) architectures to website DOM trees and benchmark their performance on a diverse set of element nomination tasks using our proposed dataset. In element nomination, a single element on a page is selected for a given class. We show that on our challenging dataset a simple Convolutional GNN outperforms state-of-the-art methods on web element nomination. Finally, we propose a new training method that further boosts the element nomination accuracy. In nomination for the web, classification (assigning a class to a given element) is usually used as a surrogate objective for nomination during training. Our novel training methodology steers the classification objective towards the more complex and useful nomination objective.

查看译文

关键词

graph neural networks,representation learning,neural networks,web

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要