Interactive Recommender System via Knowledge Graph-enhanced Reinforcement Learning

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020, pp. 179-188, 2020.

Cited by: 0|Bibtex|Views314|DOI:https://doi.org/10.1145/3397271.3401174
EI
Other Links: arxiv.org|dl.acm.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
The comprehensive experiments with a carefullydesigned simulation environment based on two real-world datasets demonstrate that our model can lead to significantly better performance with higher sample efficiency compared to state-of-the-arts

Abstract:

Interactive recommender system (IRS) has drawn huge attention because of its flexible recommendation strategy and the consideration of optimal long-term user experiences. To deal with the dynamic user preference and optimize accumulative utilities, researchers have introduced reinforcement learning (RL) into IRS. However, RL methods share...More

Code:

Data:

0
Introduction
  • With the wide use of mobile applications such as TikTok, Pandora radio and Instagram feeds, interactive recommender systems (IRS) have received much attention in recent years [29, 43].
  • In MAB-based models, the user preference is often modeled by a linear function that is continuously learned through the interactions with proper exploration-exploitation tradeoff.
  • These MAB-based models pre-assume that the underlying user preference remains unchanged during the recommendation process, i.e., they do not model the dynamic transitions of user preferences [43].
  • The key advantage for modern IRS is to learn about the possible dynamic transitions of the user’s preference and optimize the long-term utility
Highlights
  • With the wide use of mobile applications such as TikTok, Pandora radio and Instagram feeds, interactive recommender systems (IRS) have received much attention in recent years [29, 43]
  • Note that this paper mainly focuses on the way of incorporating knowledge graph (KG) into deep reinforcement learning (DRL) methods for IRS
  • The statistics information of these two datasets is presented in Table 2. We choose these two typical datasets since our work focuses on incorporating KG into reinforcement learning (RL)-based models for IRS
  • Our proposed knowledge graph enhanced Q-learning framework (KGQR) can achieve the same performance as the other RL-based methods using the least number of interactions
  • We proposed a knowledge graph enhanced Q-learning framework (KGQR) for the interactive recommendation
  • The comprehensive experiments with a carefullydesigned simulation environment based on two real-world datasets demonstrate that our model can lead to significantly better performance with higher sample efficiency compared to state-of-the-arts
Methods
  • The overview of the proposed framework is shown in Figure 1.
  • At the specific time during one recommendation session, according to the interaction history ot combined with the knowledge graph G, the IRS models the user’s preference st via graph convolution module and state representation module.
  • The details of these two representation learning modules will be discussed in Section 4.1.
  • The authors will introduce the candidate selection module and deep Q-network module in Section 4.2 and Section 4.3, respectively
Results
  • Evaluation Metrics

    Three evaluation metrics are used.

    Average Reward. As an IRS aims to maximize the reward of the whole episode, a straightforward evaluation measure is the average reward over each interaction of test users.
  • Three evaluation metrics are used.
  • Average Reward.
  • As an IRS aims to maximize the reward of the whole episode, a straightforward evaluation measure is the average reward over each interaction of test users.
  • Reward 1 #users.
  • ×T users γ t R(st , it ) t =1 (17).
  • The authors check for the precision and recall during T timesteps of the interactions, which are widely used metrics in traditional recommendation tasks.
  • P r ecision@T θhit t =1
Conclusion
  • The authors proposed a knowledge graph enhanced Q-learning framework (KGQR) for the interactive recommendation.
  • To the best of the knowledge, it is the first work leveraging KG in RL-based interactive recommender systems, which to a large extent addresses the sample complexity issue and significantly improves the performance.
  • The model propagates user preference among the correlated items in the graph, to deal with the extremely sparse user feedback problem in IRS.
  • All these designs improve sample efficiency, which is a common issue in previous works.
  • The comprehensive experiments with a carefullydesigned simulation environment based on two real-world datasets demonstrate that the model can lead to significantly better performance with higher sample efficiency compared to state-of-the-arts
Summary
  • Introduction:

    With the wide use of mobile applications such as TikTok, Pandora radio and Instagram feeds, interactive recommender systems (IRS) have received much attention in recent years [29, 43].
  • In MAB-based models, the user preference is often modeled by a linear function that is continuously learned through the interactions with proper exploration-exploitation tradeoff.
  • These MAB-based models pre-assume that the underlying user preference remains unchanged during the recommendation process, i.e., they do not model the dynamic transitions of user preferences [43].
  • The key advantage for modern IRS is to learn about the possible dynamic transitions of the user’s preference and optimize the long-term utility
  • Objectives:

    The authors aim to study the following research questions (RQs): RQ1: How does KGQR perform as compared with state-of-theart interactive recommendation methods? RQ2: Does KGQR improve sample efficiency? RQ3: How do different components (i.e., KG-enhanced state representation, GCN-based task-specific representation learning, neighbor-based candidate selection) affect the performance of KGQR?.
  • The authors aim to study the following research questions (RQs): RQ1: How does KGQR perform as compared with state-of-theart interactive recommendation methods?
  • RQ2: Does KGQR improve sample efficiency?
  • RQ3: How do different components (i.e., KG-enhanced state representation, GCN-based task-specific representation learning, neighbor-based candidate selection) affect the performance of KGQR?
  • Methods:

    The overview of the proposed framework is shown in Figure 1.
  • At the specific time during one recommendation session, according to the interaction history ot combined with the knowledge graph G, the IRS models the user’s preference st via graph convolution module and state representation module.
  • The details of these two representation learning modules will be discussed in Section 4.1.
  • The authors will introduce the candidate selection module and deep Q-network module in Section 4.2 and Section 4.3, respectively
  • Results:

    Evaluation Metrics

    Three evaluation metrics are used.

    Average Reward. As an IRS aims to maximize the reward of the whole episode, a straightforward evaluation measure is the average reward over each interaction of test users.
  • Three evaluation metrics are used.
  • Average Reward.
  • As an IRS aims to maximize the reward of the whole episode, a straightforward evaluation measure is the average reward over each interaction of test users.
  • Reward 1 #users.
  • ×T users γ t R(st , it ) t =1 (17).
  • The authors check for the precision and recall during T timesteps of the interactions, which are widely used metrics in traditional recommendation tasks.
  • P r ecision@T θhit t =1
  • Conclusion:

    The authors proposed a knowledge graph enhanced Q-learning framework (KGQR) for the interactive recommendation.
  • To the best of the knowledge, it is the first work leveraging KG in RL-based interactive recommender systems, which to a large extent addresses the sample complexity issue and significantly improves the performance.
  • The model propagates user preference among the correlated items in the graph, to deal with the extremely sparse user feedback problem in IRS.
  • All these designs improve sample efficiency, which is a common issue in previous works.
  • The comprehensive experiments with a carefullydesigned simulation environment based on two real-world datasets demonstrate that the model can lead to significantly better performance with higher sample efficiency compared to state-of-the-arts
Tables
  • Table1: Notations and descriptions
  • Table2: Statistics of the datasets
  • Table3: Overall Performance Comparison
  • Table4: Sample Efficiency Comparison: number of interactions to achieve reward 0.5,1.0,1.5,2.0 for each dataset
  • Table5: Comparison of Different KGQR Variants
  • Table6: Ablation Study of KGQR
Download tables as Excel
Related work
  • Traditional KG Enhanced Recommendation. Traditional KG enhanced recommendation models can be classified into three categories: path-based methods, embedding-based methods and hybrid methods. In path-based methods [25, 36, 40], KG is often treated as a heterogeneous information network (HIN), in which specific meta-paths/meta-graphs are manually designed to represent different patterns of connections. The performance of these methods is heavily dependent on the hand-crafted meta-paths, which are hard to design. In embedding-based methods, the entity embedding extracted from KG via Knowledge Graph Embedding (KGE) algorithms (like TransE [2], TransD [15], TransR [21]), is utilized to better represent items in recommendation. Zhang et al [38] propose Collaborative Knowledge Base Embedding (CKE), to jointly learn the latent representations in collaborative filtering as well as items’ semantic representations from the knowledge base, including KG, texts, and images. MKR [31] associates the embedding learning on KG with the recommendation task by cross & compress units. KSR [14] extends the GRU-based sequential recommender by integrating it with a knowledge-enhanced Key-Value Memory Network. In hybrid methods, researchers combine the above two categories to learn the user/item embeddings by exploiting high-order information in KG. Ripplenet [30] is a memory-network-like model that propagates users’ potential preferences along with links in the KG. Inspired by the development of graph neural network [9, 17, 28], KGAT [34] applies graph attention network [28] framework in a collaborative knowledge graph to learn the user, item and entity embeddings in an end-to-end manner.
Funding
  • The work is also sponsored by Huawei Innovation Research Program
Reference
  • Richard Bellman. 1952. On the theory of dynamic programming. Proceedings of the National Academy of Sciences of the United States of America 38, 8 (1952), 716.
    Google ScholarLocate open access versionFindings
  • Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In NeuIPS’13. 2787–2795.
    Google ScholarFindings
  • Haokun Chen, Xinyi Dai, Han Cai, Weinan Zhang, Xuejian Wang, Ruiming Tang, Yuzhou Zhang, and Yong Yu. 2019. Large-scale interactive recommendation with tree-structured policy gradient. In AAAI’19, Vol. 33312–3320.
    Google ScholarLocate open access versionFindings
  • Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In WSDM’19. ACM, 456–464.
    Google ScholarLocate open access versionFindings
  • Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. ACM, 7–10.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
    Findings
  • Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015).
    Findings
  • Artem Grotov and Maarten de Rijke. 2016. Online learning to rank for information retrieval: SIGIR 2016 Tutorial. In SIGIR’16. ACM, 1215–1218.
    Google ScholarLocate open access versionFindings
  • Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NeuIPS’17. 1024–1034.
    Google ScholarLocate open access versionFindings
  • Matthew Hausknecht and Peter Stone. 2015. Deep recurrent q-learning for partially observable mdps. In 2015 AAAI Fall Symposium Series.
    Google ScholarFindings
  • Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WebConf’17. International World Wide Web Conferences Steering Committee, 173–182.
    Google ScholarLocate open access versionFindings
  • Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based recommendations with recurrent neural networks. ICLR’16.
    Google ScholarFindings
  • Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, and Yinghui Xu. 2018. Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application. In SIGKDD’18. ACM, 368–377.
    Google ScholarLocate open access versionFindings
  • Jin Huang, Wayne Xin Zhao, Hongjian Dou, Ji-Rong Wen, and Edward Y Chang. 2018. Improving sequential recommendation with knowledge-enhanced memory networks. In SIGIR’18. ACM, 505–514.
    Google ScholarLocate open access versionFindings
  • Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 20Knowledge graph embedding via dynamic mapping matrix. In IJCNLP’15. 687–696.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. ICLR’15.
    Google ScholarFindings
  • Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
    Findings
  • Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In SIGKDD’08. ACM, 426–434.
    Google ScholarLocate open access versionFindings
  • Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 8 (2009), 30–37.
    Google ScholarLocate open access versionFindings
  • Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextualbandit approach to personalized news article recommendation. In WebConf’10. ACM, 661–670.
    Google ScholarLocate open access versionFindings
  • Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In AAAI’15.
    Google ScholarFindings
  • Tariq Mahmood and Francesco Ricci. 2007. Learning and adaptivity in interactive recommender systems. In Proceedings of the ninth international conference on Electronic commerce. ACM, 75–84.
    Google ScholarLocate open access versionFindings
  • Karthik Narasimhan, Tejas Kulkarni, and Regina Barzilay. 2015. Language understanding for text-based games using deep reinforcement learning. EMNLPâĂŸ15.
    Google ScholarFindings
  • Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In NeuIPS’19. 8024–8035.
    Google ScholarFindings
  • Chuan Shi, Zhiqiang Zhang, Ping Luo, Philip S Yu, Yading Yue, and Bin Wu. 2015. Semantic path based personalized recommendation on weighted heterogeneous information networks. In CIKM’15. ACM, 453–462.
    Google ScholarLocate open access versionFindings
  • David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484.
    Google ScholarLocate open access versionFindings
  • Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In AAAI’16.
    Google ScholarFindings
  • Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
    Findings
  • Huazheng Wang, Qingyun Wu, and Hongning Wang. 2017. Factorization bandits for interactive recommendation. In AAAI’17.
    Google ScholarFindings
  • Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2018. RippleNet: Propagating user preferences on the knowledge graph for recommender systems. In CIKM’18. ACM, 417–426.
    Google ScholarLocate open access versionFindings
  • Hongwei Wang, Fuzheng Zhang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2019. Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation. In WebConf’19. ACM, 2000–2010.
    Google ScholarFindings
  • Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, and Minyi Guo. 2019. Knowledge graph convolutional networks for recommender systems. In WebConf’19. ACM, 3307–3313.
    Google ScholarLocate open access versionFindings
  • Jun Wang, Arjen P De Vries, and Marcel JT Reinders. 2006. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In SIGIR’06. ACM, 501–508.
    Google ScholarLocate open access versionFindings
  • Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019. KGAT: Knowledge Graph Attention Network for Recommendation. SIGKDD’19.
    Google ScholarFindings
  • Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. 2016. Dueling network architectures for deep reinforcement learning. ICML’16.
    Google ScholarFindings
  • Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation: A heterogeneous information network approach. In WSDM’14. ACM, 283–292.
    Google ScholarLocate open access versionFindings
  • Chunqiu Zeng, Qing Wang, Shekoofeh Mokhtari, and Tao Li. 2016. Online contextaware recommendation with time varying multi-armed bandit. In SIGKDD’16. ACM, 2025–2034.
    Google ScholarLocate open access versionFindings
  • Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In SIGKDD’16. ACM, 353–362.
    Google ScholarLocate open access versionFindings
  • Weinan Zhang, Ulrich Paquet, and Katja Hofmann. 2016. Collective noise contrastive estimation for policy transfer learning. In AAAI’16.
    Google ScholarFindings
  • Huan Zhao, Quanming Yao, Jianda Li, Yangqiu Song, and Dik Lun Lee. 2017. Metagraph based recommendation fusion over heterogeneous information networks. In SIGKDD’17. ACM, 635–644.
    Google ScholarLocate open access versionFindings
  • Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep reinforcement learning for page-wise recommendations. In ACM RecSys’18. ACM, 95–103.
    Google ScholarLocate open access versionFindings
  • Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with negative feedback via pairwise deep reinforcement learning. In SIGKDD’18. ACM, 1040–1048.
    Google ScholarLocate open access versionFindings
  • Xiaoxue Zhao, Weinan Zhang, and Jun Wang. 2013. Interactive collaborative filtering. In CIKM’13. ACM, 1411–1420.
    Google ScholarLocate open access versionFindings
  • Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In WebConf’18. International World Wide Web Conferences Steering Committee, 167–176.
    Google ScholarLocate open access versionFindings
  • Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, and Dawei Yin. 2019. Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems. arXiv preprint arXiv:1902.05570 (2019).
    Findings
Full Text
Your rating :
0

 

Tags
Comments