Efficient Index Updates For Mixed Update And Query Loads

2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2016)

引用 23|浏览59
暂无评分
摘要
Inverted index files are commonly used to support keyword search in document collections. While the offline construction of an index can be done efficiently, its incremental update remains a hard problem, especially when the index does not completely fit in memory. We propose a novel approach for maintaining up-to-date index files on a system that constantly serves document updates and user queries. Unlike previous updating policies, we use knowledge of both the update term distribution and the query term distribution to partition the terms into functional groups. We implement two schemes for selective enforcement of contiguous layout of the data on disk, while mandating that the cost of the consolidation is less than its estimated benefit. The first is the "greedy merge" inspired by the ski-rental problem as studied in the context of competitive analysis. The second is the "opportunistic prognosticator" - by making reliable predictions, the online problem becomes suitable for offline optimizations.
更多
查看译文
关键词
Index,Inverted index file,Up-to-date index file
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要