Investigating Bloom Filters for Web Archives’ Holdings

Martin Klein,Lyudmila Balakireva,Karolina Holub, Drazenko Celjak, Ingeborg Rudomino

2022 ACM/IEEE Joint Conference on Digital Libraries (JCDL)(2022)

引用 1|浏览12
暂无评分
摘要
What web archives hold is often opaque to the public and even experts in the domain struggle to provide precise assessments. Given the increasing need for and use of crawled and archived web resources, discovery of individual records as well as sharing of entire holdings are pressing use cases. We investigate Bloom Filters (BFs) and their applicability to address these use cases. We experiment with and analyze parameters for their creation, measure their performance, outline an approach for scalability, and describe various pilot implementations that showcase their potential to meet our needs. BFs come with beneficial characteristics and hence have enjoyed popularity in various domains. We highlight their suitability for web archiving use cases and how they can contribute to very fast and accurate search services. CCS CONCEPTS • Information systems ${\rightarrow}$ Data structures; Digital libraries and archives; Service discovery and interfaces;
更多
查看译文
关键词
bloom filters,web archives,web archive profiling,index sharing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要