Leveraging Demonstrations for Reinforcement Recommendation Reasoning over Knowledge Graphs

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020, pp. 239-248, 2020.

Cited by: 0|Bibtex|Views235|DOI:https://doi.org/10.1145/3397271.3401171
EI
Other Links: dl.acm.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
Experiments show that our method outperforms the state-of-the-art baselines on both recommendation accuracy and explainability

Abstract:

Knowledge graphs have been widely adopted to improve recommendation accuracy. The multi-hop user-item connections on knowledge graphs also endow reasoning about why an item is recommended. However, reasoning on paths is a complex combinatorial optimization problem. Traditional recommendation methods usually adopt brute-force methods to fi...More

Code:

Data:

0
Introduction
  • Knowledge graphs (KGs), which organize auxiliary facts about items in heterogeneous graphs, have been shown effective in improving recommendation performance.On the one hand, the connectivity between users and items in a KG helps better model underlying user-item relations and improve recommendation accuracy.
  • The multi-hop connections between users and items endow reasoning about recommendations, which enhances explainability.
  • The reason for recommending Acalme Sneaker to user Bob can be revealed by the connection Bob Purchase −−−−−−−→ Revolution Running Shoe Produced_By −−−−−−−−−−→ Nike Produce
Highlights
  • Knowledge graphs (KGs), which organize auxiliary facts about items in heterogeneous graphs, have been shown effective in improving recommendation performance.On the one hand, the connectivity between users and items in a KG helps better model underlying user-item relations and improve recommendation accuracy
  • We propose to guide knowledge graph reasoning with demonstrations and show how these demonstrations can be extracted with minimum labeling efforts by using our meta-heuristic-based extraction method
  • The major challenge is how we effectively model the imperfect demonstrations, the observed interactions, and the facts in the KG in a unified framework.To achieve this goal, we design an ADversarial Actor-Critic model (ADAC) that integrates actor-critic-based reinforcement learning with adversarial imitation learning
  • We design a demonstration-based knowledge graph reasoning framework for explainable recommendation, which addresses the issues related to convergence and explainability
  • We first leverage a meta-heuristic-based demonstration extractor to derive a set of path demonstrations with minimum labeling efforts
  • Experiments show that our method outperforms the state-of-the-art baselines on both recommendation accuracy and explainability
Methods
  • The aggregated reward is learned through the policy-gradient method as in the PGPR approach. ADAC-P: ADAC-P removes the path discriminator from ADAC. ADAC-M: ADAC-P removes the meta-path discriminator
Results
  • Results of PI

    The results suggest that ADAC can achieve a recommendation performance similar to that of SP by using the paths of interest as demonstrations.
  • While ADAC is able to find the reasoning path of the type U-I-U-I (Fig. 4A), it can generalize from U-I-U-The author toes U-I-W-I (Fig. 4B), which well explains why the user likes the two related items
  • This indicates that by correctly modeling users’ item-level interest at the beginning of the training process using U-I-U-I, the model can gradually learn to model users’ feature-level interest (U-I-W-I) effectively based on the connections on the knowledge graph.
  • Randomly sampling an individual word in the reviews may introduce much noise into the demonstrations
Conclusion
  • The authors design a demonstration-based knowledge graph reasoning framework for explainable recommendation, which addresses the issues related to convergence and explainability.
  • The authors first leverage a meta-heuristic-based demonstration extractor to derive a set of path demonstrations with minimum labeling efforts.
  • The authors propose an ADversarial Actor-Critic (ADAC) model for demonstration-guided path finding.
  • The authors will investigate how to leverage the reasoning paths to generate natural language explanations for the users
Summary
  • Introduction:

    Knowledge graphs (KGs), which organize auxiliary facts about items in heterogeneous graphs, have been shown effective in improving recommendation performance.On the one hand, the connectivity between users and items in a KG helps better model underlying user-item relations and improve recommendation accuracy.
  • The multi-hop connections between users and items endow reasoning about recommendations, which enhances explainability.
  • The reason for recommending Acalme Sneaker to user Bob can be revealed by the connection Bob Purchase −−−−−−−→ Revolution Running Shoe Produced_By −−−−−−−−−−→ Nike Produce
  • Objectives:

    The goal of this paper is to study how fast convergence and better explainability can be achieved by better supervising path finding.
  • The authors aim to solve the aforementioned issues by proposing a demonstration-based knowledge graph reasoning framework
  • Methods:

    The aggregated reward is learned through the policy-gradient method as in the PGPR approach. ADAC-P: ADAC-P removes the path discriminator from ADAC. ADAC-M: ADAC-P removes the meta-path discriminator
  • Results:

    Results of PI

    The results suggest that ADAC can achieve a recommendation performance similar to that of SP by using the paths of interest as demonstrations.
  • While ADAC is able to find the reasoning path of the type U-I-U-I (Fig. 4A), it can generalize from U-I-U-The author toes U-I-W-I (Fig. 4B), which well explains why the user likes the two related items
  • This indicates that by correctly modeling users’ item-level interest at the beginning of the training process using U-I-U-I, the model can gradually learn to model users’ feature-level interest (U-I-W-I) effectively based on the connections on the knowledge graph.
  • Randomly sampling an individual word in the reviews may introduce much noise into the demonstrations
  • Conclusion:

    The authors design a demonstration-based knowledge graph reasoning framework for explainable recommendation, which addresses the issues related to convergence and explainability.
  • The authors first leverage a meta-heuristic-based demonstration extractor to derive a set of path demonstrations with minimum labeling efforts.
  • The authors propose an ADversarial Actor-Critic (ADAC) model for demonstration-guided path finding.
  • The authors will investigate how to leverage the reasoning paths to generate natural language explanations for the users
Tables
  • Table1: The statistics of our datasets
  • Table2: Comparison of recommendation accuracy on three real-word datasets. The results are reported in percentage
  • Table3: Comparison of explainability on three real-word datasets. The results are reported in percentage
  • Table4: Explanation and recommendation performance on Beauty with different demonstration paths
  • Table5: Explanation and recommendation performance on Clothing with different demonstration paths
Download tables as Excel
Related work
  • Our work is related to the knowledge-graph-based (KG-based) recommendation and reinforcement learning (RL) in recommendation.

    5.1 KG-Based Recommendation

    Existing knowledge-graph-based recommendation methods can be divided into two groups: embedding-based and path-based.

    Embedding-based methods learn entity and relation representations with KG embedding techniques [5, 44] and integrate the learned representations into the recommendation model to improve accuracy [10, 28, 49]. For example, Huang et al [21] captured the attribute-level user preferences and improved recommendation accuracy by using memory networks to incorporate KG representations. Wang et al [40] fused the knowledge-level and semanticlevel representations of news items to improve the prediction of their click-through rates. Cao et al [6] devised a system that transferred knowledge by jointly learning a recommendation model and a KG completion model. These methods demonstrate the usefulness of KG in improving recommendation accuracy and illustrate how KG embeddings allow for the flexible incorporation of knowledge. However, their indirect utilization of the knowledge graph structure prevents them from effectively modeling the connectivity patterns and sequential dependencies. As a result, their recommendation accuracy and reasoning capability are limited. For example, they cannot explain their recommendation by providing a path in the KG that connects a user with the recommended item.
Funding
  • This work was supported by NSFC (91646202), National Key R&D Program of China (2018YFB1404401, 2018YFB1402701)
Reference
  • Qingyao Ai, Vahid Azizi, Xu Chen, and Yongfeng Zhang. 2018. Learning Heterogeneous Knowledge Base Embeddings for Explainable Recommendation. Algorithms 11, 9 (2018), 137.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron C. Courville, and Yoshua Bengio. 2017. An Actor-Critic Algorithm for Sequence Prediction. In ICLR (Poster).
    Google ScholarFindings
  • R Bellman. 201Dynamic Programming, Courier Corporation. New York, NY 707 (2013).
    Google ScholarFindings
  • Christian Blum and Andrea Roli. 2003. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM computing surveys (CSUR) 35, 3 (2003), 268–308.
    Google ScholarLocate open access versionFindings
  • Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS. 2787–2795.
    Google ScholarFindings
  • Yixin Cao, Xiang Wang, Xiangnan He, Zikun Hu, and Tat-Seng Chua. 2019. Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences. In WWW. ACM, 151–161.
    Google ScholarLocate open access versionFindings
  • Rose Catherine and William W. Cohen. 2016. Personalized Recommendations using Knowledge Graphs: A Probabilistic Logic Programming Approach. In RecSys. ACM, 325–332.
    Google ScholarLocate open access versionFindings
  • Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 201Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation. In KDD. ACM, 1187–1196.
    Google ScholarLocate open access versionFindings
  • Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, and Le Song. 201Generative Adversarial User Model for Reinforcement Learning Based Recommendation System. In ICML. PMLR, 1052–1061.
    Google ScholarFindings
  • Zhongxia Chen, Xiting Wang, Xing Xie, Mehul Parsana, Akshay Soni, Xiang Ao, and Enhong Chen. 2020. Towards Explainable Conversational Recommendation. In IJCAI.
    Google ScholarFindings
  • Zhongxia Chen, Xiting Wang, Xing Xie, Tong Wu, Guoqing Bu, Yining Wang, and Enhong Chen. 2019. Co-attentive multi-task learning for explainable recommendation. In IJCAI. 2137–2143.
    Google ScholarFindings
  • Edsger W Dijkstra et al. 1959. A note on two problems in connexion with graphs. Numerische mathematik 1, 1 (1959), 269–271.
    Google ScholarFindings
  • Shaohua Fan, Junxiong Zhu, Xiaotian Han, Chuan Shi, Linmei Hu, Biyu Ma, and Yongliang Li. 2019. Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation. In KDD. ACM, 2478–2486.
    Google ScholarLocate open access versionFindings
  • Jingyue Gao, Xiting Wang, Yasha Wang, and Xing Xie. 2019. Explainable Recommendation Through Attentive Multi-View Learning. AAAI.
    Google ScholarLocate open access versionFindings
  • Thomas R Gruber, Adam J Cheyer, and Donald W Pitschel. 2016. Crowd sourcing information to fulfill user requests. US Patent 9,280,610.
    Google ScholarFindings
  • Tao Gui, Peng Liu, Qi Zhang, Liang Zhu, Minlong Peng, Yunhua Zhou, and Xuanjing Huang. 2019. Mention Recommendation in Twitter with Cooperative Multi-Agent Reinforcement Learning. In SIGIR. ACM, 535–544.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In ICCV. IEEE Computer Society, 1026–1034.
    Google ScholarLocate open access versionFindings
  • Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining collaborative filtering recommendations. In Proceedings of the 2000 ACM conference on Computer supported cooperative work. ACM, 241–250.
    Google ScholarLocate open access versionFindings
  • Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. In NIPS. 4565–4573.
    Google ScholarFindings
  • Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S. Yu. 2018. Leveraging Metapath based Context for Top- N Recommendation with A Neural Co-Attention Model. In KDD. ACM, 1531–1540.
    Google ScholarLocate open access versionFindings
  • Jin Huang, Wayne Xin Zhao, Hong-Jian Dou, Ji-Rong Wen, and Edward Y. Chang. 2018. Improving Sequential Recommendation with KnowledgeEnhanced Memory Networks. In SIGIR. ACM, 505–514.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).
    Google ScholarLocate open access versionFindings
  • Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155 (2016).
    Findings
  • Piji Li, Zihao Wang, Zhaochun Ren, Lidong Bing, and Wai Lam. 2017. Neural rating regression with abstractive tips generation for recommendation. In SIGIR. 345–354.
    Google ScholarLocate open access versionFindings
  • Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In ICLR (Poster).
    Google ScholarFindings
  • László Lovász et al. 1993. Random walks on graphs: A survey. Combinatorics, Paul erdos is eighty 2, 1 (1993), 1–46.
    Google ScholarLocate open access versionFindings
  • Weizhi Ma, Min Zhang, Yue Cao, Woojeong Jin, Chenyang Wang, Yiqun Liu, Shaoping Ma, and Xiang Ren. 2019. Jointly Learning Explainable Rules for Recommendation with Knowledge Graph. In WWW. ACM, 1210–1221.
    Google ScholarFindings
  • Enrico Palumbo, Giuseppe Rizzo, and Raphaël Troncy. 2017. entity2rec: Learning User-Item Relatedness from Knowledge Graphs for Top-N Item Recommendation. In RecSys. ACM, 32–36.
    Google ScholarLocate open access versionFindings
  • Dean Pomerleau. 1991. Efficient Training of Artificial Neural Networks for Autonomous Navigation. Neural Computation 3, 1 (1991), 88–97.
    Google ScholarLocate open access versionFindings
  • Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars SchmidtThieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI. AUAI Press, 452–461.
    Google ScholarFindings
  • Wenjie Shang, Yang Yu, Qingyang Li, Zhiwei Qin, Yiping Meng, and Jieping Ye. 2019. Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation. In KDD. ACM, 566–576.
    Google ScholarLocate open access versionFindings
  • Guy Shani, David Heckerman, and Ronen I. Brafman. 2005. An MDP-Based Recommender System. J. Mach. Learn. Res. 6 (2005), 1265–1295.
    Google ScholarLocate open access versionFindings
  • Amit Sharma and Dan Cosley. 2013. Do social explanations work?: studying and modeling the effects of social explanations in recommender systems. In WWW. ACM, 1133–1144.
    Google ScholarLocate open access versionFindings
  • Yizhou Sun and Jiawei Han. 2012. Mining heterogeneous information networks: a structural analysis approach. SIGKDD Explorations 14, 2 (2012), 20–28.
    Google ScholarLocate open access versionFindings
  • Zhu Sun, Jie Yang, Jie Zhang, Alessandro Bozzon, Long-Kai Huang, and Chi Xu. 2018. Recurrent knowledge graph embedding for effective recommendation. In RecSys. ACM, 297–305.
    Google ScholarLocate open access versionFindings
  • Richard S. Sutton. 1988. Learning to Predict by the Methods of Temporal Differences. Machine Learning 3 (1988), 9–44.
    Google ScholarLocate open access versionFindings
  • Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement learning - an introduction. MIT Press.
    Google ScholarFindings
  • Nava Tintarev and Judith Masthoff. 2007. A survey of explanations in recommender systems. In ICDE workshop. IEEE, 801–810.
    Google ScholarLocate open access versionFindings
  • Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2018. RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems. In CIKM. ACM, 417–426.
    Google ScholarLocate open access versionFindings
  • Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. DKN: Deep Knowledge-Aware Network for News Recommendation. In WWW. ACM, 1835– 1844.
    Google ScholarFindings
  • Lu Wang, Wei Zhang, Xiaofeng He, and Hongyuan Zha. 2018. Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation. In KDD. ACM, 2447–2456.
    Google ScholarLocate open access versionFindings
  • Xiting Wang, Yiru Chen, Jie Yang, Le Wu, Zhengtao Wu, and Xing Xie. 2018. A Reinforcement Learning Framework for Explainable Recommendation. In ICDM. IEEE, 587–596.
    Google ScholarLocate open access versionFindings
  • Xiang Wang, Dingxian Wang, Canran Xu, Xiangnan He, Yixin Cao, and Tat-Seng Chua. 2019. Explainable Reasoning over Knowledge Graphs for Recommendation. In AAAI. AAAI Press, 5329–5336.
    Google ScholarLocate open access versionFindings
  • Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In AAAI. AAAI Press, 1112– 1119.
    Google ScholarFindings
  • Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8 (1992), 229–256.
    Google ScholarLocate open access versionFindings
  • Yikun Xian, Zuohui Fu, S. Muthukrishnan, Gerard de Melo, and Yongfeng Zhang. 2019. Reinforcement Knowledge Graph Reasoning for Explainable Recommendation. In SIGIR. ACM, 285–294.
    Google ScholarLocate open access versionFindings
  • Wenhan Xiong, Thien Hoang, and William Yang Wang. 2017. Deeppath: A reinforcement learning method for knowledge graph reasoning. In EMNLP.
    Google ScholarFindings
  • Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation: a heterogeneous information network approach. In WSDM. ACM, 283–292.
    Google ScholarLocate open access versionFindings
  • Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative Knowledge Base Embedding for Recommender Systems. In KDD. ACM, 353–362.
    Google ScholarLocate open access versionFindings
  • Jing Zhang, Bowen Hao, Bo Chen, Cuiping Li, Hong Chen, and Jimeng Sun. 2019. Hierarchical Reinforcement Learning for Course Recommendation in MOOCs. In AAAI. AAAI Press, 435–442.
    Google ScholarFindings
  • Yongfeng Zhang, Qingyao Ai, Xu Chen, and W. Bruce Croft. 2017. Joint Representation Learning for Top-N Recommendation with Heterogeneous Information Sources. In CIKM. ACM, 1449–1458.
    Google ScholarLocate open access versionFindings
  • Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and Shaoping Ma. 2014. Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In SIGIR. 83–92.
    Google ScholarLocate open access versionFindings
  • Huan Zhao, Quanming Yao, Jianda Li, Yangqiu Song, and Dik Lun Lee. 2017. Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks. In KDD. ACM, 635–644.
    Google ScholarLocate open access versionFindings
  • Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep reinforcement learning for page-wise recommendations. In RecSys. ACM, 95–103.
    Google ScholarLocate open access versionFindings
  • Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. In KDD. ACM, 1040–1048.
    Google ScholarLocate open access versionFindings
  • Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A Deep Reinforcement Learning Framework for News Recommendation. In WWW. ACM, 167–176.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments