Using an Inverted Index Synopsis for Query Latency and Performance Prediction
ACM Transactions on Information Systems(2020)
摘要
AbstractPredicting the query latency by a search engine has important benefits, for instance, in allowing the search engine to adjust its configuration to address long-running queries without unnecessarily sacrificing its effectiveness. However, for the dynamic pruning techniques that underlie many commercial search engines, achieving accurate predictions of query latencies is difficult. We propose the use of index synopses—which are stochastic samples of the full index—for attaining accurate timing predictions. Indeed, we experiment using the TREC ClueWeb09 collection, and a large set of real user queries, and find that using small index synopses it is possible to very accurately estimate properties of the larger index, including sizes of posting list unions and intersections. Thereafter, we demonstrate that index synopses facilitate two key use cases: first, for query efficiency prediction, we show that predicting the query latencies on the full index and classifying long-running queries can be accurately achieved using index synopses; second, for query performance prediction, we show that the effectiveness of queries can be estimated more accurately using a synopsis index post-retrieval predictor than a pre-retrieval predictor. Overall, our experiments demonstrate the value of such a stochastic sample of a larger index at predicting the properties of the larger index.
更多查看译文
关键词
Inverted index synopsis, query efficiency prediction, query performance prediction, dynamic pruning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络