Improved methods for static index pruning

2016 IEEE International Conference on Big Data (Big Data)(2016)

引用 5|浏览87
暂无评分
摘要
Static Index Pruning is a performance optimization technique for search engines that attempts to identify and remove index postings that are unlikely to lead to top results for typical user queries. The goal is to obtain a much smaller inverted index that can quickly return results that are (almost) as good as those for the unpruned index. We make two contributions: First, we improve on previous results for pruned index size through a careful analysis of both document and query distribution characteristics. We derive an initial model based on unigram probabilities that obtains gains over previous work in some cases, and a bigram-based approach that achieves some additional improvements. We also devise a simple method for generating query logs in the absence of real-life queries, useful in modeling top results. Our second contribution is to explore, and compare to previously proposed approaches that perform pruning based on how often documents or postings appeared in top positions in the past.
更多
查看译文
关键词
static pruning,index,search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要