ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020, pp. 579-588, 2020.

Cited by: 0|Bibtex|Views65|DOI:https://doi.org/10.1145/3397271.3401043
EI
Other Links: arxiv.org|dl.acm.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We propose entire space adaptation model to improve long-tail performance with discriminative domain adaptation by introducing non-displayed items

Abstract:

Most of ranking models are trained only with displayed items (most are hot items), but they are utilized to retrieve items in the entire space which consists of both displayed and non-displayed items (most are long-tail items). Due to the sample selection bias, the long-tail items lack sufficient records to learn good feature representati...More

Code:

Data:

0
Introduction
  • A typical formulation of the ranking model is to provide a rank list of items given a query
  • It has a wide range of applications, including recommender systems [22, 33], search systems [4, 20], and so on.
  • What’s worse, this training strategy makes the skew of the model towards popular items [23], which means that these models usually retrieve hot items while ignoring those long-tail items that would be more suitable, especially those that are new arrival
  • This phenomenon is called "Matthew Effect" [29].
  • As shown in Fig. 2a, the existence of domain shift [5] means that these ranking models are difficult to retrieve long-tail items, because they always overfit the hot items
Highlights
  • A typical formulation of the ranking model is to provide a rank list of items given a query
  • (2) We find that the integration of LDA can significantly reduce the disparity between distributions (Fig. 6b), which proves that the correlation between item high-level attributes can reflect the distribution of domain well
  • We propose entire space adaptation model (ESAM) to improve long-tail performance with discriminative domain adaptation by introducing non-displayed items
  • It is worth mentioning that ESAM is a general framework, which can be integrated into many existing ranking models
  • The offline experiments on two public datasets and a Taobao industrial dataset prove that ESAM can be integrated into the existing SOTA baselines to improve retrieval performance, especially in the long-tail space
  • Online experiments further demonstrate the superiority of ESAM at Taobao search engine
Methods
  • The authors first briefly introduce the basic ranking framework named BaseModel.
  • ESAM, which contains the proposed A2C and two regularization strategies, is integrated into the BaseModel for better item feature representation learning in the entire space.
  • The source domain is denoted as Ds , the target domain is denoted as Dt , and the entire item space is denoted as D = Ds ∪ Dt. Here, the source domain and target domain share the same query set Q.
  • NeuralMF [19] YoutubeNet [12] RALM [29] BST[8]
Results
  • (2) The authors find that the integration of LDA can significantly reduce the disparity between distributions (Fig. 6b), which proves that the correlation between item high-level attributes can reflect the distribution of domain well.
Conclusion
  • The authors propose ESAM to improve long-tail performance with discriminative domain adaptation by introducing non-displayed items.
  • To the best of the knowledge, this is the first work to adopt domain adaptation with non-diplayed items for ranking model.
  • The offline experiments on two public datasets and a Taobao industrial dataset prove that ESAM can be integrated into the existing SOTA baselines to improve retrieval performance, especially in the long-tail space.
  • The authors verify the necessity of each constraint by ablation studies
Summary
  • Introduction:

    A typical formulation of the ranking model is to provide a rank list of items given a query
  • It has a wide range of applications, including recommender systems [22, 33], search systems [4, 20], and so on.
  • What’s worse, this training strategy makes the skew of the model towards popular items [23], which means that these models usually retrieve hot items while ignoring those long-tail items that would be more suitable, especially those that are new arrival
  • This phenomenon is called "Matthew Effect" [29].
  • As shown in Fig. 2a, the existence of domain shift [5] means that these ranking models are difficult to retrieve long-tail items, because they always overfit the hot items
  • Methods:

    The authors first briefly introduce the basic ranking framework named BaseModel.
  • ESAM, which contains the proposed A2C and two regularization strategies, is integrated into the BaseModel for better item feature representation learning in the entire space.
  • The source domain is denoted as Ds , the target domain is denoted as Dt , and the entire item space is denoted as D = Ds ∪ Dt. Here, the source domain and target domain share the same query set Q.
  • NeuralMF [19] YoutubeNet [12] RALM [29] BST[8]
  • Results:

    (2) The authors find that the integration of LDA can significantly reduce the disparity between distributions (Fig. 6b), which proves that the correlation between item high-level attributes can reflect the distribution of domain well.
  • Conclusion:

    The authors propose ESAM to improve long-tail performance with discriminative domain adaptation by introducing non-displayed items.
  • To the best of the knowledge, this is the first work to adopt domain adaptation with non-diplayed items for ranking model.
  • The offline experiments on two public datasets and a Taobao industrial dataset prove that ESAM can be integrated into the existing SOTA baselines to improve retrieval performance, especially in the long-tail space.
  • The authors verify the necessity of each constraint by ablation studies
Tables
  • Table1: Analysis of distribution distance (LDA) extracted by BaseModel on the Industrial dataSet
  • Table2: Statistics of experimental datasets. ’M’ represents the million and ’B’ represents the billion
  • Table3: Performance comparison between the methods without and with ESAM on MovieLens-1M dataset. “Hot” represents hot items in the test set, “Long-tail” represents long-tail items in the test set, and “Entire” represents all items in the test set. The best results are highlighted in boldface. The improvements over the baseliens are statistically significance at 0.05 level
  • Table4: Performance comparison between methods without and with ESAM on CIKM Cup 2016 dataset. The best results are highlighted in boldface. The improvements over the baseliens are statistically significance at 0.05 level
  • Table5: Cold-start performance of BST [<a class="ref-link" id="c8" href="#r8">8</a>] w/o and w/ ESAM
  • Table6: Ablation study on Industrial dataset
Download tables as Excel
Related work
  • 5.1 Neural Network-Based Ranking Model

    Recently, many works have been done to use deep neural networks for many ranking based applications. A significant development in the ranking model is deep learning to rank (LTR) [3, 11, 19, 25, 35, 47]. To tackle this problem, some methods utilize DA techniques, such as maximum mean discrepancy (MMD) [38], adversarial training [23], to alleviate inconsistent distributions in the source and target domains. Also, some methods [2, 43] introduce an unbiased dataset obtained by an unbiased system (i.e., randomly selecting items from the entire item pool to a query) to train an unbiased model. Besides, some methods introduce auxiliary information [43, 44] or auxiliary domains [15, 30] to obtain more long-tail information. Unlike previous approaches, ESAM joints domain adaptation and non-displayed items to improve long-tail performance without any auxiliary information and auxiliary domains. Moreover, we design a novel DA technique, named attribute correlation alignment, which regards the correlation between item high-level attributes as knowledge to transfer.
Funding
  • This work was supported by Alibaba Group through Alibaba Innovative Research Program and National Natural Science Foundation of China (No 61872278)
Reference
  • Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In ICML. 41–48.
    Google ScholarLocate open access versionFindings
  • Stephen Bonner and Flavian Vasile. 2018. Causal embeddings for recommendation. In RecSys. 104–112.
    Google ScholarFindings
  • Christopher Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N Hullender. 2005. Learning to rank using gradient descent. In ICML. 89–96.
    Google ScholarLocate open access versionFindings
  • Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81.
    Google ScholarLocate open access versionFindings
  • J Quiñonero Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. 2009. Dataset shift in machine learning.
    Google ScholarFindings
  • Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In ICML. 129–136.
    Google ScholarLocate open access versionFindings
  • Chao Chen, Zhihong Chen, Boyuan Jiang, and Xinyu Jin. 2019. Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation. In AAAI. 3296–3303.
    Google ScholarFindings
  • Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behavior Sequence Transformer for E-commerce Recommendation in Alibaba. CoRR abs/1905.06874 (2019).
    Findings
  • Zhihong Chen, Chao Chen, Zhaowei Cheng, Ke Fang, and Xinyu Jin. 201Selective Transfer with Reinforced Transfer Network for Partial Domain Adaptation. arXiv preprint arXiv:1905.10756 (2019).
    Findings
  • Zhihong Chen, Chao Chen, Xinyu Jin, Yifu Liu, and Zhaowei Cheng. 2019. Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domain adaptation. Neural Computing and Applications (2019), 1–14.
    Google ScholarLocate open access versionFindings
  • Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016.
    Google ScholarLocate open access versionFindings
  • Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In RecSys. 191–198.
    Google ScholarFindings
  • Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. Decaf: A deep convolutional activation feature for generic visual recognition. In ICML. 647–655.
    Google ScholarLocate open access versionFindings
  • Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17, 1 (2016), 2096–2030.
    Google ScholarLocate open access versionFindings
  • Chen Gao, Xiangning Chen, Fuli Feng, Kai Zhao, Xiangnan He, Yong Li, and Depeng Jin. 2019. Cross-domain Recommendation Without Sharing User-relevant Data. In WWW. 491–502.
    Google ScholarFindings
  • Yves Grandvalet and Yoshua Bengio. 2005. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems. 529–536.
    Google ScholarLocate open access versionFindings
  • Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In CVPR. 1735–1742.
    Google ScholarFindings
  • Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WWW. 173–182.
    Google ScholarLocate open access versionFindings
  • Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM. 2333–2338.
    Google ScholarFindings
  • Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In KDD. 133–142.
    Google ScholarLocate open access versionFindings
  • Heishiro Kanagawa, Hayato Kobayashi, Nobuyuki Shimizu, Yukihiro Tagami, and Taiji Suzuki. 2019. Cross-domain recommendation via deep domain adaptation. In European Conference on Information Retrieval. 20–29.
    Google ScholarLocate open access versionFindings
  • Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 8 (2009), 30–37.
    Google ScholarLocate open access versionFindings
  • Adit Krishnan, Ashish Sharma, Aravind Sankar, and Hari Sundaram. 2018. An Adversarial Approach to Improve Long-Tail Performance in Neural Collaborative Filtering. In CIKM. 1491–1494.
    Google ScholarLocate open access versionFindings
  • Himabindu Lakkaraju, Jon Kleinberg, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2017. The selective labels problem: Evaluating algorithmic predictions in the presence of unobservables. In KDD. 275–284.
    Google ScholarLocate open access versionFindings
  • Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Pipei Huang, Huan Zhao, Guoliang Kang, Qiwei Chen, Wei Li, and Dik Lun Lee. 2019. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall. CoRR abs/1904.08030 (2019).
    Findings
  • Ping Li, Qiang Wu, and Christopher J Burges. 2008. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in neural information processing systems. 897–904.
    Google ScholarLocate open access versionFindings
  • Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational autoencoders for collaborative filtering. In WWW. 689–698.
    Google ScholarLocate open access versionFindings
  • Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-margin softmax loss for convolutional neural networks.. In ICML. 7.
    Google ScholarLocate open access versionFindings
  • Yudan Liu, Kaikai Ge, Xu Zhang, and Leyu Lin. 2019. Real-time Attention Based Look-alike Model for Recommender System. CoRR abs/1906.05022 (2019).
    Findings
  • Tong Man, Huawei Shen, Xiaolong Jin, and Xueqi Cheng. 2017. Cross-Domain Recommendation: An Embedding and Mapping Approach.. In IJCAI. 2464–2470.
    Google ScholarFindings
  • Ramesh Nallapati. 2004. Discriminative models for information retrieval. In SIGIR. 64–71.
    Google ScholarLocate open access versionFindings
  • Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10 (2009), 1345–1359.
    Google ScholarLocate open access versionFindings
  • Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI. 452– 461.
    Google ScholarFindings
  • Badrul Munir Sarwar, George Karypis, Joseph A Konstan, John Riedl, et al. 2001. Item-based collaborative filtering recommendation algorithms. WWW (2001), 285–295.
    Google ScholarLocate open access versionFindings
  • Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to rank short text pairs with convolutional deep neural networks. In SIGIR. 373–382.
    Google ScholarLocate open access versionFindings
  • Ajit P Singh and Geoffrey J Gordon. 2008. Relational learning via collective matrix factorization. In KDD. 650–658.
    Google ScholarLocate open access versionFindings
  • Baochen Sun and Kate Saenko. 2016. Deep coral: Correlation alignment for deep domain adaptation. In European Conference on Computer Vision. 443–450.
    Google ScholarLocate open access versionFindings
  • Brandon Tran, Maryam Karimzadehgan, Rama Kumar Pasumarthi, Michael Bendersky, and Donald Metzler. 2019. Domain Adaptation for Enterprise Email Search. CoRR abs/1906.07897 (2019).
    Findings
  • Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. 2017. Adversarial discriminative domain adaptation. In CVPR. 7167–7176.
    Google ScholarFindings
  • Maksims Volkovs and Guang Wei Yu. 2015. Effective latent models for binary feedback in recommender systems. In SIGIR. 313–322.
    Google ScholarLocate open access versionFindings
  • Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, and Dell Zhang. 2017. Irgan: A minimax game for unifying generative and discriminative information retrieval models. In SIGIR. 515–524.
    Google ScholarLocate open access versionFindings
  • Xiangnan He Chenxu Wang Meng Wang Yan Li Yang Zhang, Fuli Feng and Yongdong Zhang. 2020. How to Retrain a Recommender System? A Sequential Meta-Learning Approach. In SIGIR.
    Google ScholarFindings
  • Bowen Yuan, Jui-Yang Hsia, Meng-Yuan Yang, Hong Zhu, Chih-Yao Chang, Zhenhua Dong, and Chih-Jen Lin. 2019. Improving Ad Click Prediction by Considering Non-displayed Events. In CIKM. 329–338.
    Google ScholarLocate open access versionFindings
  • Feng Yuan, Lina Yao, and Boualem Benatallah. 2019. DARec: Deep Domain Adaptation for Cross-Domain Recommendation via Transferring Rating Patterns. CoRR abs/1905.10760 (2019).
    Findings
  • Werner Zellinger, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, and Susanne Saminger-Platz. 2017. Central moment discrepancy (cmd) for domaininvariant representation learning. CoRR abs/1702.08811 (2017).
    Findings
  • Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In AAAI. 5941–5948.
    Google ScholarFindings
  • Ziwei Zhu, Jianling Wang, and James Caverlee. 2019. Improving Top-K Recommendation via JointCollaborative Autoencoders. In WWW. 3483–3482.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments