## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# On ranking via sorting by estimated expected utility

NIPS 2020, (2020)

EI

Keywords

Abstract

Ranking tasks are defined through losses that measure trade-offs between different desiderata such as the relevance and the diversity of the items at the top of the list. This paper addresses the question of which of these tasks are asymptotically solved by sorting by decreasing order of expected utility, for some suitable notion of utili...More

Code:

Data:

Introduction

- The usual approach in learning to rank is to score each item given the input, and produce the ranking by sorting in decreasing order of scores.
- This score-and-sort approach follows the probability ranking principle of information retrieval [29], which stipulates that documents should be rank-ordered according to their estimated probability of relevance to the query.

Highlights

- The usual approach in learning to rank is to score each item given the input, and produce the ranking by sorting in decreasing order of scores
- We study what ranking tasks are solved via sorting by expected utilities, in a general supervised ranking framework that captures different types of ground-truth signal and losses
- Since utilities can serve as target values to learn the scoring function through square loss regression, the optimality of sorting by expected utilities is equivalent to the consistency of regression
- The main question we address is : When is square loss regression consistent for ranking via score-and-sort?
- In Section 3.1, we showed that optimal scoring functions for non-compatible with expected utility (CEU) ranking losses are discontinuous for some applied to the Expected Reciprocal Rank (ERR) and the AP on random distributions where the ERR/the AP have bad local minima. percentage of optimization runs based on gradient descent that end up stuck in local minima
- For supervised ranking with the score-and-sort approach, learning the scoring function through regression is consistent for all ranking tasks for which a convex risk minimization approach is consistent

Results

- The sub-optimality of a local minimum is value max min min in of local minima are more than 90% sub-optimal. Illustration of optimal rankings for the ERR and the AP, for the fictional search engine scenario with the ambiguous query “jaguar”.

Conclusion

- For supervised ranking with the score-and-sort approach, learning the scoring function through regression is consistent for all ranking tasks for which a convex risk minimization approach is consistent.
- For tasks with non-CEU ranking losses, one possible avenue is to develop efficient direct loss minimization approaches, such as approximations of NC above or as proposed by Song et al [30].
- Another direction is to find alternatives to score-and-sort.
- A possible starting point would be to build on the recent work on excess risk bounds for non-calibrated losses [32]

Summary

## Introduction:

The usual approach in learning to rank is to score each item given the input, and produce the ranking by sorting in decreasing order of scores.- This score-and-sort approach follows the probability ranking principle of information retrieval [29], which stipulates that documents should be rank-ordered according to their estimated probability of relevance to the query.
## Results:

The sub-optimality of a local minimum is value max min min in of local minima are more than 90% sub-optimal. Illustration of optimal rankings for the ERR and the AP, for the fictional search engine scenario with the ambiguous query “jaguar”.## Conclusion:

For supervised ranking with the score-and-sort approach, learning the scoring function through regression is consistent for all ranking tasks for which a convex risk minimization approach is consistent.- For tasks with non-CEU ranking losses, one possible avenue is to develop efficient direct loss minimization approaches, such as approximations of NC above or as proposed by Song et al [30].
- Another direction is to find alternatives to score-and-sort.
- A possible starting point would be to build on the recent work on excess risk bounds for non-calibrated losses [32]

- Table1: Example of ranking losses with their utilities, if any. We give examples with different types of supervision, including DAGn, which is the set of directed acyclic graphs used in the computation of the pairwise disagreement (PD) studied by Duchi et al [<a class="ref-link" id="c15" href="#r15">15</a>]

Funding

- The sub-optimality of a local minimum is value max min min in of local minima are more than 90% sub-optimal. (right) Illustration of optimal rankings for the ERR (diversity-inducing) and the AP (diversity-averse), for the fictional search engine scenario with the ambiguous query “jaguar”
- Fig. 2 (middle) displays the sub-optimality of these local minima, showing that 25% of them are more than 10% sub-optimal

Reference

- A. Agarwal, K. Takatsu, I. Zaitsev, and T. Joachims. A general framework for counterfactual learning-to-rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 5–14, 2019.
- F. G. Arenas. Alexandroff spaces. 1999.
- E. Bakshy, S. Messing, and L. A. Adamic. Exposure to ideologically diverse news and opinion on facebook. Science, 348(6239):1130–1132, 2015.
- A. R. Barron. Complexity Regularization with Application to Artificial Neural Networks, pages 561–576. Springer Netherlands, Dordrecht, 1991.
- S. Bird, S. Barocas, K. Crawford, F. Diaz, and H. Wallach. Exploring or exploiting? social and ethical implications of autonomous experimentation in ai. In Workshop on Fairness, Accountability, and Transparency in Machine Learning, 2016.
- D. Buffoni, C. Calauzènes, P. Gallinari, and N. Usunier. Learning scoring functions with order-preserving losses and standardized supervision. In Proceedings of the 28th International Conference on International Conference on Machine Learning, pages 825–832, 2011.
- C. Calauzènes, N. Usunier, and P. Gallinari. On the (non-)existence of convex, calibrated surrogate losses for ranking. In Advances in Neural Information Processing Systems 25, pages 197–205. 2012.
- C. Calauzènes, N. Usunier, and P. Gallinari. Calibration and regret bounds for order-preserving surrogate losses in learning to rank. Machine learning, 93(2-3):227–260, 2013.
- O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 621–630, 2009.
- O. Chapelle, S. Ji, C. Liao, E. Velipasaoglu, L. Lai, and S.-L. Wu. Intent-based diversification of web search results: metrics and algorithms. Information Retrieval, 14(6):572–592, 2011.
- C. Ciliberto, A. Rudi, and L. Rosasco. A consistent regularization approach for structured prediction. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 4419–4427, 2016.
- D. Cossock and T. Zhang. Statistical analysis of bayes optimal subset ranking. IEEE Transactions on Information Theory, 54(11):5140–5154, 2008.
- O. Dekel, Y. Singer, and C. D. Manning. Log-linear models for label ranking. In Advances in neural information processing systems, pages 497–504, 2004.
- K. Dembczynski, W. Kotlowski, and E. Hüllermeier. Consistent multilabel ranking through univariate losses. arXiv preprint arXiv:1206.6401, 2012.
- J. C. Duchi, L. W. Mackey, and M. I. Jordan. On the consistency of ranking algorithms. In Proceedings of the 27th International Conference on International Conference on Machine Learning, pages 327–334, 2010.
- T. Joachims, A. Swaminathan, and T. Schnabel. Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pages 781–789, 2017.
- M. Kay, C. Matuszek, and S. A. Munson. Unequal representation and gender stereotypes in image search results for occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 3819–3828. ACM, 2015.
- J. Keshet and D. A. McAllester. Generalization bounds and consistency for latent structural probit and ramp loss. In Advances in Neural Information Processing Systems 24, pages 2205– 2212. 2011.
- W. Kotłowski, K. Dembczynski, and E. Hüllermeier. Bipartite ranking through minimization of univariate loss. In Proceedings of the 28th International Conference on International Conference on Machine Learning, pages 1113–1120, 2011.
- J.-W. Kuo, P.-J. Cheng, and H.-M. Wang. Learning to rank from bayesian decision inference. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 827–836, 2009.
- Q. Le and A. Smola. Direct optimization of ranking measures. arXiv preprint arXiv:0704.3359, 2007.
- C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval. Cambridge university press, 2008.
- Q. Nguyen. On connected sublevel sets in deep learning. arXiv preprint arXiv:1901.07417, 2019.
- E. A. Ok. Real Analysis with Economic Applications. Number mathecon1 in Online economics textbooks. SUNY-Oswego, Department of Economics, January 2004.
- A. Osokin, F. Bach, and S. Lacoste-Julien. On structured prediction theory with calibrated convex surrogate losses. In Advances in Neural Information Processing Systems, pages 302–313, 2017.
- H. G. Ramaswamy and S. Agarwal. Convex calibration dimension for multiclass loss matrices. The Journal of Machine Learning Research, 17(1):397–441, 2016.
- H. G. Ramaswamy, S. Agarwal, and A. Tewari. Convex calibrated surrogates for low-rank loss matrices with applications to subset ranking losses. In Advances in Neural Information Processing Systems, pages 1475–1483, 2013.
- P. Ravikumar, A. Tewari, and E. Yang. On ndcg consistency of listwise ranking methods. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 618–626, 2011.
- S. E. Robertson. The probability ranking principle in ir. Journal of documentation, 33(4): 294–304, 1977.
- Y. Song, A. Schwing, R. Urtasun, et al. Training deep neural networks via direct loss minimization. In International Conference on Machine Learning, pages 2169–2177, 2016.
- I. Steinwart. How to compare different loss functions and their risks. Constructive Approximation, 26(2):225–287, 2007.
- K. Struminsky, S. Lacoste-Julien, and A. Osokin. Quantifying learning guarantees for convex but inconsistent surrogates. In Advances in Neural Information Processing Systems, pages 669–677, 2018.
- M. Taylor, J. Guiver, S. Robertson, and T. Minka. Softrank: optimizing non-smooth rank metrics. In Proceedings of the 2008 International Conference on Web Search and Data Mining, pages 77–86, 2008.
- L. Vaughan and Y. Zhang. Equal representation by search engines? a comparison of websites across countries and domains. Journal of computer-mediated communication, 12(3):888–909, 2007.
- M. N. Volkovs and R. S. Zemel. Boltzrank: learning to maximize expected ranking gain. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1089–1096, 2009.
- W. Waegeman, K. Dembczynski, A. Jachnik, W. Cheng, and E. Hüllermeier. On the bayesoptimality of f-measure maximizers. Journal of Machine Learning Research, 15:3333–3388, 2014.
- R. Wijsman. Continuity of the Bayes risk. The Annals of Mathematical Statistics, 41(3): 1083–1085, 1970.
- J. I. Yellott Jr. The relationship between luce’s choice axiom, thurstone’s theory of comparative judgment, and the double exponential distribution. Journal of Mathematical Psychology, 15(2): 109–144, 1977.
- Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 271–278, 2007.

Tags

Comments