Debiasing Grid-based Product Search in E-commerce

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event CA USA July, 2020, pp. 2852-2860, 2020.

Cited by: 0|Bibtex|Views84|DOI:https://doi.org/10.1145/3394486.3403336
EI
Other Links: dl.acm.org|academic.microsoft.com|dblp.uni-trier.de
Weibo:
We study the novel problem unbiased learning to rank algorithms in grid-based product search for e-commerce

Abstract:

The widespread usage of e-commerce websites in daily life and the resulting wealth of implicit feedback data form the foundation for systems that train and test e-commerce search ranking algorithms. While convenient to collect, implicit feedback data inherently suffers from various types of bias since user feedback is limited to products ...More

Code:

Data:

0
Introduction
  • Large-scale search ranking systems have been deployed for a variety of e-commerce websites such as Amazon, Ebay, JD, Taobao and Walmart.
  • Different from the traditional list-based web search engines such as Google and Baidu which display search engine result pages (SERPs) in the manner of 1-dimensional lists with textual information, e-commerce websites show SERPs in 2dimensional grids along with images and meta information of the products
  • Such a difference in display can significantly change the way users interact with SERPs. In a previous study [27], researchers observed several unique user behavior patterns in grid-based SERPs with images: (1) users may scroll down to browse more products by skipping some rows in the middle of each SERP, and (2) the decay of users’ attention is often slower than that in list-based web search.
  • Randomized experiments can be expensive, time consuming and can hurt users’ experience
Highlights
  • Large-scale search ranking systems have been deployed for a variety of e-commerce websites such as Amazon, Ebay, JD, Taobao and Walmart
  • We propose the joint examination hypothesis, which extends the original examination hypothesis widely used in list-wise web search to handle multiple types of implicit feedback in the context e-commerce
  • We did not adopt these new evaluation metrics because without eye-tracking experiments we cannot obtain ground truth for the parameters of these evaluation metrics which quantify the decay of attention. Different from their focus, we propose to incorporate the row skipping and slower decay click models for propensity score modeling toward unbiased learning to rank
  • We study the novel problem unbiased learning to rank algorithms in grid-based product search for e-commerce
  • We motivate the usage of the row skipping and slower decay models for inverse propensity scoring justified through empirical evidence from data analysis
  • Through extensive experiments on real-world e-commerce search log datasets across browsing devices and product taxonomies, we show that the proposed framework outperforms the state of the art unbiased learning to rank algorithms
  • Extensive experimental results show the effectiveness of the proposed framework across browsing devices and product taxonomies in datasets collected from a real-world e-commerce website
Results
  • At least one of the two proposed methods outperforms the baselines in almost all of the cases
  • This demonstrates the effectiveness of the proposed unbiased ranker, which is able to capture unique user behavior patterns in grid-based product search with these two simple propensity models.
  • This is because unbiased LambdaMART relies on a different pairwise inverse propensity scoring strategy but shares the same underlying ranker (LambdaMART)
  • This observation can be attributed to incorporating prior knowledge of users’ behavior patterns to guide the learning process of propensity score models
Conclusion
  • The authors study the novel problem unbiased learning to rank algorithms in grid-based product search for e-commerce.
  • The proposed framework utilizes multiple types of feedback and leverages users’ behavior patterns in grid-based product search for propensity score modeling.
  • Future work includes (1) modeling propensity with meta information from SERPs, (2) relaxation of the joint examination hypothesis to handle multiple types of feedback, and (3) strategies to address products with low or no feedback in evaluation metrics
Summary
  • Introduction:

    Large-scale search ranking systems have been deployed for a variety of e-commerce websites such as Amazon, Ebay, JD, Taobao and Walmart.
  • Different from the traditional list-based web search engines such as Google and Baidu which display search engine result pages (SERPs) in the manner of 1-dimensional lists with textual information, e-commerce websites show SERPs in 2dimensional grids along with images and meta information of the products
  • Such a difference in display can significantly change the way users interact with SERPs. In a previous study [27], researchers observed several unique user behavior patterns in grid-based SERPs with images: (1) users may scroll down to browse more products by skipping some rows in the middle of each SERP, and (2) the decay of users’ attention is often slower than that in list-based web search.
  • Randomized experiments can be expensive, time consuming and can hurt users’ experience
  • Objectives:

    The authors aim to utilize all types of implicit feedback as the supervision signals. Number of columns and rows of e-commerce SERPs, the authors aim to learn the propensity score model(s) which would be used to reweigh products for unbiased estimate of rankers’ loss and train unbiased rankers with inverse propensity scoring to maximize e-commerce search metrics on held-out test data.
  • Based on the same intuition, in the proposed framework, the authors aim to find the joint optimum of the two models by minimizing the loss function through grid search on hyperparameters.
  • The authors aim to achieve a joint optimum of both the propensity score models and the ranker with the implicit feedback data.
  • The authors aim to learn a ranker based on the unbiased loss function L.
  • In e-commerce search, the authors aim to help buyers explore unseen items, in [11], authors proposed a multi-armed bandit (MAB) method which allows exploration of items that are shown less than a certain times in a time interval
  • Results:

    At least one of the two proposed methods outperforms the baselines in almost all of the cases
  • This demonstrates the effectiveness of the proposed unbiased ranker, which is able to capture unique user behavior patterns in grid-based product search with these two simple propensity models.
  • This is because unbiased LambdaMART relies on a different pairwise inverse propensity scoring strategy but shares the same underlying ranker (LambdaMART)
  • This observation can be attributed to incorporating prior knowledge of users’ behavior patterns to guide the learning process of propensity score models
  • Conclusion:

    The authors study the novel problem unbiased learning to rank algorithms in grid-based product search for e-commerce.
  • The proposed framework utilizes multiple types of feedback and leverages users’ behavior patterns in grid-based product search for propensity score modeling.
  • Future work includes (1) modeling propensity with meta information from SERPs, (2) relaxation of the joint examination hypothesis to handle multiple types of feedback, and (3) strategies to address products with low or no feedback in evaluation metrics
Tables
  • Table1: Data Statistics
  • Table2: Feature Description
  • Table3: Experimental results show comparison of model effectiveness using the held-out test set of the 4 datasets. Best results are highlighted in boldface. Significant improvements with respect to the best baseline are indicated with +
Download tables as Excel
Related work
  • Here, we review the related work from the three subareas: unbiased learning to rank, grid-based search and e-commerce search. Unbiased Learning to Rank is an area where causal inference [12] helps learning to rank. Given the same attractiveness (relevance), the probability of products (documents) being clicked may change significantly with many factors in SERPs of product (web) search. Position is one of the most significant factor. It has been studied in list-wise web search [2, 14, 16, 22, 23]. As the literature of unbiased learning to focuses on solving the problem of position bias in traditional information retrieval systems, here, we use the terms, document and relevance, instead of product and attractiveness. Joachims et al [16] analyzed the inherent position bias in search log data with implicit feedback and proposed the Propensity SVMRank [15] algorithm which applies inverse propensity scoring to each clicked document to mitigate the position bias. In particular, the propensity scores of each position is estimated through an randomized experiment which randomly picks and swaps items at the -th and -th positions [16]. In [1], the authors extended the Propensity SVM-Rank model to directly optimize additive information retrieval metrics such as DCG and proposed to replace the SVM-Rank model with neural networks. However, such randomized experiments may degrade users’ experience and would likely be
Funding
  • This material is partially based upon work supported by the National Science Foundation (NSF) Grant #1614567 and #1909555
Reference
  • Aman Agarwal, Ivan Zaitsev, and Thorsten Joachims. 2018. Counterfactual Learning-to-Rank for Additive Metrics and Deep Models. arXiv preprint arXiv:1805.00065 (2018).
    Findings
  • Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 385–394.
    Google ScholarLocate open access versionFindings
  • Qingyao Ai, Yongfeng Zhang, Keping Bi, Xu Chen, and W Bruce Croft. 2017. Learning a hierarchical embedding model for personalized product search. In SIGIR. ACM, 645–654.
    Google ScholarLocate open access versionFindings
  • Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.
    Google ScholarFindings
  • Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81.
    Google ScholarLocate open access versionFindings
  • Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In ICML. ACM, 129–136.
    Google ScholarLocate open access versionFindings
  • Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In WSDM. ACM, 87–94.
    Google ScholarLocate open access versionFindings
  • Yoav Freund, Raj Iyer, Robert E Schapire, and Yoram Singer. 2003. An efficient boosting algorithm for combining preferences. JMLR 4, Nov (2003), 933–969.
    Google ScholarLocate open access versionFindings
  • Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189–1232.
    Google ScholarLocate open access versionFindings
  • Anjan Goswami, Prasant Mohapatra, and Chengxiang Zhai. 2019. Quantifying and Visualizing the Demand and Supply Gap from Ecommerce Search Data using Topic Models. In Companion Proceedings of WWW. ACM, 348–353.
    Google ScholarLocate open access versionFindings
  • Anjan Goswami, ChengXiang Zhai, and Prasant Mohapatra. 2018. Towards Optimization of E-Commerce Search and Discovery. In The 2018 SIGIR Workshop On eCommerce.
    Google ScholarFindings
  • Ruocheng Guo, Lu Cheng, Jundong Li, P Richard Hahn, and Huan Liu.
    Google ScholarFindings
  • 2018. A survey of learning causality with data: Problems and methods. arXiv preprint arXiv:1809.09337 (2018).
    Findings
  • [13] Malay Haldar, Mustafa Abdool, Prashant Ramanathan, Tao Xu, Shulin Turnbull, Brendan M Collins, et al. 2019. Applying deep learning to Airbnb search. In SIGKDD. ACM, 1927–1935.
    Google ScholarLocate open access versionFindings
  • [14] Ziniu Hu, Yang Wang, Qu Peng, and Hang Li. 2019. Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm. In The World Wide Web Conference. ACM, 2830–2836.
    Google ScholarLocate open access versionFindings
  • [15] Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In SIGKDD. ACM, 133–142.
    Google ScholarLocate open access versionFindings
  • [16] Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 20Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 781–789.
    Google ScholarLocate open access versionFindings
  • [17] Shubhra Kanti Karmaker Santu, Parikshit Sondhi, and ChengXiang Zhai. 2017. On application of learning to rank for e-commerce search. In SIGIR. ACM, 475–484.
    Google ScholarLocate open access versionFindings
  • [18] Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. TOIS 27, 1 (2008), 2.
    Google ScholarLocate open access versionFindings
  • [19] Daria Sorokina and Erick Cantu-Paz. 2016. Amazon search: The joy of ranking products. In SIGIR. ACM, 459–460.
    Google ScholarLocate open access versionFindings
  • [20] Andrew Stanton, Liangjie Hong, and Manju Rajashekhar. 2018. Buzzsaw: A System for High Speed Feature Engineering. In SysML.
    Google ScholarFindings
  • [21] Christophe Van Gysel, Maarten de Rijke, and Evangelos Kanoulas. 2016. Learning latent vector spaces for product search. In CIKM. ACM, 165–174.
    Google ScholarLocate open access versionFindings
  • [22] Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to rank with selection bias in personal search. In SIGIR. ACM, 115–124.
    Google ScholarLocate open access versionFindings
  • [23] Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position bias estimation for unbiased learning to rank in personal search. In WSDM. ACM, 610–618.
    Google ScholarLocate open access versionFindings
  • [24] Liang Wu, Diane Hu, Liangjie Hong, and Huan Liu. 2018. Turning clicks into purchases: Revenue optimization for product search in ecommerce. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 365–374.
    Google ScholarLocate open access versionFindings
  • [25] Qiang Wu, Christopher JC Burges, Krysta M Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval 13, 3 (2010), 254–270.
    Google ScholarLocate open access versionFindings
  • [26] Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise approach to learning to rank: theory and algorithm. In ICML. ACM, 1192–1199.
    Google ScholarLocate open access versionFindings
  • [27] Xiaohui Xie, Jiaxin Mao, Yiqun Liu, Maarten de Rijke, Yunqiu Shao, Zixin Ye, Min Zhang, and Shaoping Ma. 2019. Grid-based Evaluation Metrics for Web Image Search. (2019).
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments