Incremental Schema Generation for Large and Evolving RDF Sources
Transactions on Large-Scale Data- and Knowledge-Centered Systems LI Lecture Notes in Computer Science(2022)
Abstract
The lack of a descriptive schema for an RDF dataset has motivated several research works addressing the problem of automatic schema discovery. The goal of these approaches is to provide the underlying structural schema of a given RDF dataset, either from the existing instances, or using some schema-related declarations if provided. However, as the instances in the RDF dataset evolve, the generated schema may become inconsistent with the dataset. It is therefore necessary to incrementally update the existing schema according to the changes occurring in the dataset over time. In this paper, we propose a schema discovery approach for massive RDF datasets which incrementally deals with both the insertion and the deletion of entities. It is based on a scalable and incremental density-based clustering algorithm which propagates the changes occurring in the dataset into the clusters corresponding to the classes of the schema. Our approach is implemented using big data technologies to scale-up to massive data, while providing a high quality clustering result. We present some experiments which demonstrate the efficiency of our proposal on both synthetic and real datasets.
MoreTranslated text
Key words
Schema Matching,Column-oriented Database Systems,Data Integration
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined