Learning to Rank Documents for Ad-Hoc Retrieval with Regularized Models

msra(2007)

引用 26|浏览39
暂无评分
摘要
In language modeling (LM) approaches for information retrieval (IR), the estimation of document model is critical for retrieval effectiveness. Recent studies have proven that mixture models combining multiple resources can improve the accuracy of the estimation. There arises the problem of how to estimate the mixture weights in the model. In most previous studies, the mixture weights are assigned manually. In some other studies, the mixture weights are assigned using supervised or unsupervised learning. However, we observe that the mixture weights are the same for all the queries. In addition, they can be very unbalanced. In this paper, we proposed two regularized models to estimate the query-dependent weights, one is a variant of EM algorithm - Deterministic Annealing EM (DAEM), and another is a L2- regularized log-linear model (RLM). Both of them rely on some regularization methods to avoid unbalanced mixture weight and to make them better fit the test data. We evaluate the two models on one TREC collection. Experimental results show that the two models perform very well. Especially, the RLM even outperforms the model using optimal mixture weights obtained by exhaustive search in the parameter space.
更多
查看译文
关键词
information retrieval,unsupervised learning,regularized mixture model,language modeling,mixture model,log linear model,em algorithm,language model,learning to rank,parameter space,exhaustive search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要