Graph pattern detection and structural redundancy reduction to compress named graphs

Information Sciences(2023)

引用 0|浏览13
暂无评分
摘要
The flexible paradigm of Resource Description Framework (RDF) has accelerated the raw data published on the web. Therefore, the volume of generated RDF data has increased impressively in the last decade promoting compression to manage and reduce the size of RDF datasets. Universal RDF compressors can be able to detect and remove redundancy at symbolic, syntactic, or semantic levels. However, these compressors rarely exploit the graph patterns as well as structural regularities in real-world datasets. An efficient approach for compressing the RDF datasets based on the structural properties is HDT (Header-Dictionary-Triple). However, it cannot manage the RDF datasets with named graphs, the regularities of the graph structure, and structural redundancies. Because HDT considers all the triples to reside in the same default graph. Though, the triples of an RDF dataset belong to various named graphs. In this study, we have proposed a novel approach to deal with the above-mentioned challenges. We introduce hybrid TI-GI (Triple Interpreter-Graph Interpreter) to manage the RDF datasets with named graphs and use compact RDF serialization. We also propose RDF-RR (RDF-Redundancy Reducer) and object mapping that detects and removes structural redundancies by identifying the common patterns related to the predicates and objects in the RDF datasets. We employ a differential compressor to discover the frequent graph pattern in a single pass by using the data structure-oriented approach of the dataset. Evaluation of real-world datasets affirms that our proposed approach can substantially reduce the size of the experimental RDF datasets at approximately 30.52%, 24.92%, and 26.96% when compared with the existing HDT, HDT-FoQ (HDT-Focused on Querying) and the 2Tp (two Triple Predicate based index) approaches. Moreover, the indexing time of our proposed system is also reduced at approximately 17.89%, 13.70%, and 9.32% when compared with the HDT, HDT-FoQ, and 2Tp approaches.
更多
查看译文
关键词
Compression, Graph pattern, Named graphs, RDF, Structural redundancy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要