SemRec: a personalized semantic recommendation method based on weighted heterogeneous information networks

World Wide Web, pp. 153-184, 2019.

Cited by: 0|Bibtex|Views109|DOI:https://doi.org/10.1007/s11280-018-0553-6
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|link.springer.com
Weibo:
– We propose weighted Heterogeneous Information Network and weighted meta path to consider attribute values on links in information networks

Abstract:

Recently heterogeneous information network (HIN) analysis has attracted a lot of attention, and many data mining tasks have been exploited on HIN. As an important data mining task, recommender system includes a lot of object types (e.g., users, movies, actors, and interest groups in movie recommendation) and the rich relations among objec...More

Code:

Data:

0
Introduction
  • There is a surge of research on Heterogeneous Information Network (HIN) in which objects are of different types and links among objects represent different relations [26, 29].
  • If the authors recommend movies following this meta path, it will recommend the movies that are seen by users having the same viewing records with the given user
  • It corresponds to the collaborative filtering model in essence.
  • The authors can directly recommend items based on the similar users generated by different meta paths connecting users
  • It can realize different recommendation models through properly setting meta paths.
Highlights
  • In recent years, there is a surge of research on Heterogeneous Information Network (HIN) in which objects are of different types and links among objects represent different relations [26, 29]
  • – We propose weighted HIN and weighted meta path to consider attribute values on links in information networks
  • We extend conventional HIN and meta path for information networks with attribute values on links, and apply them on recommender system
  • HIN and weighted meta path to more subtly depict object relations through distinguishing link attribute values, and put forwards the similarity measure strategy based on weighted meta path
  • Extensive experiments illustrate the effectiveness of Semantic path based personalized Recommendation method (SemRec)
  • On 20% training set of Douban Movie, SemRecReg outperforms PMF up to 19.55% on RSME and 15.89% on Mean Absolute Error (MAE), and on 20% training set of Douban Book, SemRecReg outperforms PMF up to 22.49% on RSME and 16.64% on MAE
  • Since weighted meta paths contain rich semantics, future work includes a recommender system based on our SemRec method
Methods
  • In order to show the effectiveness of the proposed SemRec, the authors compare four variations of SemRec with the state of the arts.
  • Note that the top k recommendation methods [6, 36] are not included here, since the problem they solve is different from the rating prediction in this paper.
  • – HeteMF [35]: A matrix factorization method with entity similarity regularization, which utilizes the relations in HIN.
  • How to select meta paths for real applications is an open problem in HIN [23].
Results
  • On 20% training set of Douban Movie, SemRecReg outperforms PMF up to 19.55% on RSME and 15.89% on MAE, and on 20% training set of Douban Book, SemRecReg outperforms PMF up to 22.49% on RSME and 16.64% on MAE.
Conclusion
  • The authors find that the unified weight learning method (L1 in (8)) is a special case of personalized weight learning (L2 in (10)), when the weights of all users on path Pl (i.e., W (l)) have the same value.
  • Through setting different meta paths, SemRec can flexibly realize different recommendation models and generate different recommendations complying with path semantics.In this paper, the authors extend conventional HIN and meta path for information networks with attribute values on links, and apply them on recommender system.
  • The authors design a novel semantic path based personalized recommendation method SemRec. The SemRec method flexibly integrates heterogeneous information through setting meta paths, and obtains the prioritized and personalized weights representing user preferences on paths.
  • The system will tell users which meta path is of high weight and the most similar user to them according to the weights
Summary
  • Introduction:

    There is a surge of research on Heterogeneous Information Network (HIN) in which objects are of different types and links among objects represent different relations [26, 29].
  • If the authors recommend movies following this meta path, it will recommend the movies that are seen by users having the same viewing records with the given user
  • It corresponds to the collaborative filtering model in essence.
  • The authors can directly recommend items based on the similar users generated by different meta paths connecting users
  • It can realize different recommendation models through properly setting meta paths.
  • Methods:

    In order to show the effectiveness of the proposed SemRec, the authors compare four variations of SemRec with the state of the arts.
  • Note that the top k recommendation methods [6, 36] are not included here, since the problem they solve is different from the rating prediction in this paper.
  • – HeteMF [35]: A matrix factorization method with entity similarity regularization, which utilizes the relations in HIN.
  • How to select meta paths for real applications is an open problem in HIN [23].
  • Results:

    On 20% training set of Douban Movie, SemRecReg outperforms PMF up to 19.55% on RSME and 15.89% on MAE, and on 20% training set of Douban Book, SemRecReg outperforms PMF up to 22.49% on RSME and 16.64% on MAE.
  • Conclusion:

    The authors find that the unified weight learning method (L1 in (8)) is a special case of personalized weight learning (L2 in (10)), when the weights of all users on path Pl (i.e., W (l)) have the same value.
  • Through setting different meta paths, SemRec can flexibly realize different recommendation models and generate different recommendations complying with path semantics.In this paper, the authors extend conventional HIN and meta path for information networks with attribute values on links, and apply them on recommender system.
  • The authors design a novel semantic path based personalized recommendation method SemRec. The SemRec method flexibly integrates heterogeneous information through setting meta paths, and obtains the prioritized and personalized weights representing user preferences on paths.
  • The system will tell users which meta path is of high weight and the most similar user to them according to the weights
Tables
  • Table1: The meanings and corresponding recommendation models of meta paths
  • Table2: Examples and their semantic meanings of weighted meta paths
  • Table3: Statistics of datasets
  • Table4: Meta paths used in experiments for three datasets Dataset Douban Movie Yelp Douban Book
  • Table5: Effectiveness experiments on RMSE performances for three datasets
  • Table6: Effectiveness experiments on MAE performances for three datasets
  • Table7: Running time for three datasets with 60% training setting (second)
  • Table8: Top-5 most similar authors to “Yizhou Sun” and “Charu C. Aggarwal”
Download tables as Excel
Related work
  • In this section, we will briefly summarize the related work in three aspects: collaborative filtering, heterogeneous information network and HIN-based recommendations.

    2.1 Collaborative filtering

    As one of the most popular recommendation approaches, collaborative filtering techniques have drawn lots of attention from different perspectives. Among different techniques, matrix factorization has been verified to be effective and efficient in recommender system, which factorizes the user-item rating matrix into lower rank user factor matrix and item factor matrix, and then make predictions by factorized matrices [27]. With the popularity of social media networks, there are more and more research focusing on social recommender system, which take advantage of social relations among users.

    Many researchers utilized trust information among users. Ma et al [19] fused the user-item matrix with the users’ social trust networks by sharing a common latent lowdimensional user feature matrix. Furthermore, they [20] coined the social trust ensemble to represent the formulation of the social trust restrictions. Some research began to take into account the friend relation among users. In [21], the additional social regularization term guarantees closer distance between latent feature vectors of friends with similar interests. The memberships of users are also explored to boost collaborative filtering [37]. Recently, researchers attempt to utilize the widely available attribute information of users and items for social recommendation. Yin et al [34] proposed a joint probabilistic generative model based on Latent Dirichlet Allocation to mimic user check-in behaviors in a process of decision. By exploiting both the c-occurrence pattern of spatial items and the content of spatial items, Wang et al [32] paid attention to user interests and the preference of the crowd in the target region, and then proposed a new graphical model called Geo-SAGE. However, these methods just focused on obtaining user profile or item profile but ignored the structural information between users and items.
Funding
  • This work is supported in part by the National Natural Science Foundation of China (No 61772082, 61375058, 61472468), the National Key Research and Development Program of China (2017YFB0803304), and the Co-construction Project of Beijing Municipal Commission of Education
Study subjects and analysis
special cases: 3
In order to show the effectiveness of the proposed SemRec, we compare four variations of SemRec with the state of the arts. Besides the personalized weight learning method with weight regularization (called SemRecReg), we include three special cases of SemRec: single path based method (called SemRecSgl), unified weight learning method for all users (called SemRecAll), and personalized weight learning method for individual user (called SemRecInd). As the baselines, four representative rating predication methods are illustrated as follows

datasets: 3
As the baselines, four representative rating predication methods are illustrated as follows. Note that the top k recommendation methods [6, 36] are not included here, since the problem they solve is different from the rating prediction in this paper.

– PMF [22]: It is the basic matrix factorization method using only user-item matrix for recommendations.

– SMF [21]: It adds the social regularization term into PMF, which aims at getting the users’ latent factor closer to their friends’ latent factors.

– CMF [16]: A collective matrix factorization method, which factorizes all relations in HIN and shares the latent factor of same object types in different relations.

– HeteMF [35]: A matrix factorization method with entity similarity regularization, which also utilizes the relations in HIN.

We employ 5 meaningful meta paths whose lengths are not longer than 4 for three datasets, since the longer meta paths are not meaningful and they fail to produce good similarity measures [28]
. How to select meta paths for real applications is an open problem in HIN [23]

real datasets: 3
5 University of Illinois at Chicago, Chicago, IL, USA. World Wide Web (2019) 22:153–184 on three real datasets illustrate that SemRec achieves better recommendation performance through flexibly integrating information with the help of weighted meta paths. Moreover, extensive experiments validate the benefits of weighted meta paths

real datasets: 3
In addition, SemRec can obtain the prioritized and personalized weight preferences on multiple meta paths, which are important for real applications, e.g., user characteristics analysis and recommendation explanation. – Empirical studies on three real datasets, Douban Movie, Yelp and Douban Book, demonstrate the power of SemRec. SemRec outperforms the state of the arts, especially for cold-start users and items, and the personalized weights learned by SemRec are able to reflect user preferences on paths

users: 3
P MPaL (x, :)T P MPaR (y, :) ||P MPaL (x, :)||||P MPaR (y, :)||. Taking PathSim as an example, we illustrate its calculation process along conventional and weighted meta path in Figure 3, where the rating matrix between 3 users and 2 movies are from Figure 1. We know that PathSim counts the number of path instances connecting two objects along conventional meta path with a normalized term (shown in the upper half of Figure 3), and thus it regards that the users all are the same

real datasets: 3
It is an quadratic programming problem with complexity O((|R| + |U |2) × |P|). In this section, extensive experiments on three real datasets illustrate the traits of SemRec from five aspects. We first validate the effectiveness of SemRec, especially for cold-start problem

users: 13367
In order to get more comprehensive heterogeneous information, we crawled a user-movie dataset from Douban,1 a well known social media network in China. The dataset includes 13367 users and 12677 movies with 1068278 movie ratings ranging from 1 to 5. The dataset includes the social relation among users and the attribute information of users and movies

users: 16239
This dataset contains user ratings on local business and attribute information of users and businesses. We ignore users and businesses which has no related rating and finally get 16239 users and 14284 local businesses with 198397 ratings ranging from 1 to 5. The last dataset is Douban Book dataset.3

users: 13024
The last dataset is Douban Book dataset.3. This dataset includes 13024 users and 22347 books with 792026 ratings ranging from 1 to 5, with containing social relation among users and attribute information of users and books. The detailed description of these three datasets can be seen in Table 3, and their network schemas are shown in Figure 2

datasets: 3
This dataset includes 13024 users and 22347 books with 792026 ratings ranging from 1 to 5, with containing social relation among users and attribute information of users and books. The detailed description of these three datasets can be seen in Table 3, and their network schemas are shown in Figure 2. We can find that these three datasets have different properties

datasets: 3
The detailed description of these three datasets can be seen in Table 3, and their network schemas are shown in Figure 2. We can find that these three datasets have different properties. Douban Movie dataset has dense rating relations but sparse social relations, Yelp dataset has sparse rating relations but dense social relations, while Douban Book dataset has relatively medium dense rating information with dense social relations

datasets: 3
– HeteMF [35]: A matrix factorization method with entity similarity regularization, which also utilizes the relations in HIN. We employ 5 meaningful meta paths whose lengths are not longer than 4 for three datasets, since the longer meta paths are not meaningful and they fail to produce good similarity measures [28]. How to select meta paths for real applications is an open problem in HIN [23]

cluster centers: 5
Based on the results of SemRecReg on Douban Movie dataset with 60% data for training in the above experiments, we cluster users’ weight vectors into 5 groups using K-means, and then show the statistics information of users in five clusters in Figure 5a. Moreover, the weight preferences of the five cluster centers on 5 meta paths are also shown in Figure 5b. Let’s observe the relationship of the statistics information of users in different clusters and their weight preferences on paths from Figure 5a and b

papers: 23661
A basic task in bibliographic network is top-k similarity search for a given author. In order to evaluate the performance of weighted meta paths in top-k similarity search, we perform the following experiments on a DBLP dataset including 23661 papers, 26741 authors, 20 conference and 73603 links connecting authors and papers. In this experiment, we compare the similarity search results based on APCPA path, a conventional meta path which connects authors publishing papers in the same conference, and A(i)PCP(j)A|i = j path, a weighted meta path which connects authors publishing papers in the same conference with the same author order

Reference
  • Burke, R., Vahedian, F., Mobasher, B.: Hybrid recommendation in heterogeneous networks. In: UMAP, pp. 49–60 (2014)
    Google ScholarFindings
  • Cao, X., Zheng, Y., Shi, C., Li, J., Wu, B.: Meta-path-based link prediction in schema-rich heterogeneous information network. Int. J. Data Sci. Analytics 3(4), 285–296 (2017)
    Google ScholarLocate open access versionFindings
  • Feng, W., Wang, J.: Incorporating heterogeneous information for personalized tag recommendation in social tagging systems. In: KDD, pp. 1276–1284 (2012)
    Google ScholarFindings
  • Han, J.: Mining heterogeneous information networks: the next frontier. In: KDD, p. Keynote speech (2012)
    Google ScholarLocate open access versionFindings
  • Haveliwala, T.H.: Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search[J]. IEEE Trans. Knowl. Data Eng. 15(4), 784–796 (2003)
    Google ScholarLocate open access versionFindings
  • Jamali, M., Lakshmanan, L.V.: Heteromf: recommendation in heterogeneous information networks using context dependent factor models. In: WWW, pp. 643–653 (2013)
    Google ScholarFindings
  • Ji, M., Han, J., Danilevsky, M.: Ranking-based classification of heterogeneous information networks. In: KDD, pp. 1298–1306 (2011)
    Google ScholarFindings
  • Kuo, T.T., Yan, R., Huang, Y., Kung, P.H., Lin, S.D.: Unsupervised link prediction using aggregative statistics on heterogeneous social networks. In: SIGKDD, pp. 775–783 (2013)
    Google ScholarLocate open access versionFindings
  • Lao, N., Cohen, W.: Fast query execution for retrieval models based on path constrained random walks. In: KDD, pp. 881–888 (2010)
    Google ScholarFindings
  • Lao, N., Cohen, W.W.: Relational retrieval using a combination of path-constrained random walks. Mach. Learn. 81(2), 53–67 (2010)
    Google ScholarLocate open access versionFindings
  • Lee, S., Song, S., Kahng, M., Lee, D., Lee, S.: Random walk based entity ranking on graph for multidimensional recommendation. In: RecSys, pp. 93–100 (2011)
    Google ScholarFindings
  • Lee, S., Park, S., Kahng, M., Lee, S.: Ranking nodes on a heterogeneous graph for flexible hybrid recommender systems. Expert Syst. Appl. 40, 684–697 (2013)
    Google ScholarLocate open access versionFindings
  • Li, X., Chen, H.: Recommendation as link prediction in bipartite graphs: a graph kernel-based machine learning approach, decision support systems. Decis. Support Syst. 54, 880–890 (2013)
    Google ScholarLocate open access versionFindings
  • Li, Y., Shi, C., Philip, S.Y., Chen, Q.: Hrank: a path based ranking method in heterogeneous information network. In: WAIM, pp. 553–565 (2014)
    Google ScholarFindings
  • Lin, C.J.: Projected gradient methods for non-negative matrix factorization. In: Neural Computation, pp. 2756–2279 (2007)
    Google ScholarLocate open access versionFindings
  • Lippert, C., Weber, S.H., Huang, Y., Tresp, V., Schubert, M., Kriegel, H.P.: Relation prediction in multirelational domains using matrix factorization. In: NIPS Workshop on Structured Input Structure Output (2008)
    Google ScholarLocate open access versionFindings
  • Liu, X., Yu, Y., Guo, C., Sun, Y.: Meta-path-based ranking with pseudo relevance feedback on heterogeneous graph for citation recommendation. In: CIKM, pp. 121–130 (2014)
    Google ScholarFindings
  • Luo, C., Pang, W., Wang, Z., Lin, C.: Hete-cf: social-based collaborative filtering recommendation using heterogeneous relations. In: ICDM, pp. 917–922 (2014)
    Google ScholarFindings
  • Ma, H., Yang, H., Lyu, M.R., King, I.: Sorec: social recommendation using probabilistic matrix factorization. In: CIKM, pp. 931–940 (2008)
    Google ScholarFindings
  • Ma, H., King, I., Lyu, M.R.: Learning to recommend with social trust ensemble. In: SIGIR, pp. 203–210 (2011)
    Google ScholarFindings
  • Ma, H., Zhou, D., Liu, C., Lyu, M.R., King, I.: Recommender systems with social regularization. In: WSDM, pp. 287–296 (2011)
    Google ScholarFindings
  • Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: NIPS, vol. 20 (2008)
    Google ScholarFindings
  • Shi, C., Kong, X., Huang, Y., Yu, P.S., Wu, B.: Hetesim: A general framework for relevance measure in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 26(10), 2479–2492 (2014)
    Google ScholarLocate open access versionFindings
  • Shi, C., Zhang, Z., Luo, P., Yu, P.S., Yue, Y., Wu, B.: Semantic path based personalized recommendation on weighted heterogeneous information networks. In: CIKM, pp. 453–462 (2015)
    Google ScholarFindings
  • Shi, C., Liu, J., Zhuang, F., Philip, S.Y., Wu, B.: Integrating heterogeneous information via flexible regularization framework for recommendation. Knowl. Inf. Syst. 49(3), 835–859 (2016)
    Google ScholarLocate open access versionFindings
  • Shi, C., Li, Y., Zhang, J., Sun, Y., Yu, P.S.: A survey of heterogeneous information network analysis. Srebro, N., Jaakkola, T.: Weighted low-rank approximations. In: ICML, pp. 720–727 (2003)
    Google ScholarFindings
  • 28. Sun, Y., Han, J., Yan, X., Yu, P., Wu, T.: Pathsim: meta path-based top-k similarity search in heterogeneous information networks. In: VLDB, pp. 992–1003 (2011)
    Google ScholarFindings
  • 29. Sun, Y., Han, J.: Mining heterogeneous information networks: a structural analysis approach. SIGKDD Explor. 14(2), 20–28 (2012)
    Google ScholarLocate open access versionFindings
  • 30. Sun, Y., Norick, B., Han, J., Yan, X., Yu, P.S., Yu, X.: Integrating meta-path selection with user guided object clustering in heterogeneous information networks. In: KDD, pp. 1348–1356 (2012)
    Google ScholarFindings
  • 31. Vahedian, F., Burke, R., Mobasher, B.: Weighted random walks for meta-path expansion in heterogeneous networks. In: RecSys 2016 Poster Proceedings (2016)
    Google ScholarLocate open access versionFindings
  • 32. Wang, W., Yin, H., Chen, L., Sun, Y., Sadiq, S., Zhou, X.: Geo-sage: a geographical sparse additive generative model for spatial item recommendation. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1255–1264. ACM (2015)
    Google ScholarLocate open access versionFindings
  • 33. Xie, M., Yin, H., Wang, H., Xu, F., Chen, W., Wang, S.: Learning graph-based poi embedding for location-based recommendation. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 15–24. ACM (2016)
    Google ScholarLocate open access versionFindings
  • 34. Yin, H., Zhou, X., Shao, Y., Wang, H., Sadiq, S.: Joint modeling of user check-in behaviors for point-ofinterest recommendation. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1631–1640. ACM (2015)
    Google ScholarLocate open access versionFindings
  • 35. Yu, X., Ren, X., Gu, Q., Sun, Y., Han, J.: Collaborative filtering with entity similarity regularization in heterogeneous information networks. In: IJCAI HINA (2013)
    Google ScholarFindings
  • 36. Yu, X., Ren, X., Sun, Y., Gu, Q., Sturt, B., Khandelwal, U., Norick, B., Han, J.: Personalized entity recommendation: a heterogeneous information network approach. In: WSDM, pp. 283–292 (2014)
    Google ScholarFindings
  • 37. Yuan, Q., Chen, L., Zhao, S.: Factorization vs. regularization: fusing heterogeneous social relationships in top-n recommendation. In: RecSys, pp. 245–252 (2011)
    Google ScholarLocate open access versionFindings
  • 38. Zhang, Z., Zhou, T., Zhang, Y.: Personalized recommendation via integrated diffusion on usercitemctag tripartite graphs. Physica A: Stat. Mech. Appl. 389, 179–186 (2010)
    Google ScholarLocate open access versionFindings
  • 39. Zheng, J., Liu, J., Shi, C., Zhuang, F., Li, J., Wu, B.: Dual similarity regularization for recommendation. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 542–554.
    Google ScholarLocate open access versionFindings
  • 40. Zheng, J., Liu, J., Shi, C., Zhuang, F., Li, J., Wu, B.: Recommendation in heterogeneous information network via dual similarity regularization. Int. J. Data Sci. Analytics 3(1), 35–48 (2017)
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments