Supervised random walks: predicting and recommending links in social networks

Proceedings of the fourth ACM international conference on Web search and data mining, pp. 635-644, 2011.

被引用1052|引用|浏览174|DOI:https://doi.org/10.1145/1935826.1935914
EI
其它链接dl.acm.org|dblp.uni-trier.de|academic.microsoft.com|arxiv.org
微博一下
We have proposed Supervised Random Walks, a new learning algorithm for link prediction and link recommendation

摘要

Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively stud...更多

代码

数据

0
简介
  • Large real-world networks exhibit a range of interesting properties and patterns [7, 20].
  • Research seeks to develop models that will accurately predict the global structure of the network [7, 20, 19, 6].
  • Studying the networks at a level of individual edge creations is interesting and in some respects more difficult than global network modeling.
  • Identifying the mechanisms by which such social networks evolve at the level of individual edges is a fundamental question that is still not well understood, and it forms the motivation for the work here
重点内容
  • Large real-world networks exhibit a range of interesting properties and patterns [7, 20]
  • We develop a concept of Supervised Random Walks that naturally and in a principled way combines the network structure with the characteristics of nodes and edges of the network into a unified link prediction algorithm
  • We develop a method based on Supervised Random Walks that in a supervised way learns how to bias a PageRank-like random walk on the network [3, 2] so that it visits given nodes more often than the others
  • Overall, Supervised Random Walks (SRW) give a significant improvement over the unweighted Random Walk with Restarts (RWR)
  • In Facebook (Tab. 3), Random Walk with Restarts already gives near-optimal Area under the ROC curve, while Supervised Random Walks still obtain 11% relative improvement in Precision at Top 20
  • We have proposed Supervised Random Walks, a new learning algorithm for link prediction and link recommendation
方法
  • Random Walk with Restart Degree DT: Node features DT: Path features DT: All features LR: Node features LR: Path features LR: All features SRW: one edge type SRW: multiple edge types pendent datasets, one for training and one for testing.
  • Note that much of the improvement in the curve comes in the area near the origin, corresponding to the nodes with the highest predicted values.
  • For supervised machine learning methods the authors experiments with decision trees and logistic regression and group the features used for training them into three groups:
结果
  • How well does the model perform in terms of the classification accuracy and second, whether it recovers the edge strength function parameters w∗ = [1, −1].
  • In the deterministic case of creating D and with 0 noise added, the authors hope that the algorithm is able achieve near perfect classification.
  • The authors expect the performance to drop, but even the authors hope that the recovered values of wwill be close to true w∗.
结论
  • The authors have proposed Supervised Random Walks, a new learning algorithm for link prediction and link recommendation.
  • The resulting predictions show large improvements over Random Walks with Restarts and compare favorably to supervised machine learning techniques that require tedious feature extraction and generation.
  • Supervised Random Walks are not limited to link prediction, and can be applied to many other problems that require learning to rank nodes in a graph, like recommendations, anomaly detection, missing link, and expertise search and ranking
总结
  • Introduction:

    Large real-world networks exhibit a range of interesting properties and patterns [7, 20].
  • Research seeks to develop models that will accurately predict the global structure of the network [7, 20, 19, 6].
  • Studying the networks at a level of individual edge creations is interesting and in some respects more difficult than global network modeling.
  • Identifying the mechanisms by which such social networks evolve at the level of individual edges is a fundamental question that is still not well understood, and it forms the motivation for the work here
  • Methods:

    Random Walk with Restart Degree DT: Node features DT: Path features DT: All features LR: Node features LR: Path features LR: All features SRW: one edge type SRW: multiple edge types pendent datasets, one for training and one for testing.
  • Note that much of the improvement in the curve comes in the area near the origin, corresponding to the nodes with the highest predicted values.
  • For supervised machine learning methods the authors experiments with decision trees and logistic regression and group the features used for training them into three groups:
  • Results:

    How well does the model perform in terms of the classification accuracy and second, whether it recovers the edge strength function parameters w∗ = [1, −1].
  • In the deterministic case of creating D and with 0 noise added, the authors hope that the algorithm is able achieve near perfect classification.
  • The authors expect the performance to drop, but even the authors hope that the recovered values of wwill be close to true w∗.
  • Conclusion:

    The authors have proposed Supervised Random Walks, a new learning algorithm for link prediction and link recommendation.
  • The resulting predictions show large improvements over Random Walks with Restarts and compare favorably to supervised machine learning techniques that require tedious feature extraction and generation.
  • Supervised Random Walks are not limited to link prediction, and can be applied to many other problems that require learning to rank nodes in a graph, like recommendations, anomaly detection, missing link, and expertise search and ranking
表格
  • Table1: Dataset statistics. N, E: number of nodes and edges in the full network, S: number of sources, C: avg. number of candidates per source, D : avg. number of destination nodes
  • Table2: Hep-Ph co-authorship network. DT: decision tree, LR: logistic regression, and SRW: Supervised Random Walks
  • Table3: Results for the Facebook dataset
  • Table4: Results for all datasets. We compare favorably to logistic features as run on all features. Our Supervised Random Walks (SRW) perform significantly better than the baseline in all cases on ROC area. The variance is too high on the Top20 metric, and the two methods are statistically tied on this metric
Download tables as Excel
基金
  • Research was in-part supported by NSF CNS-1010921, NSF IIS-1016909, AFRL FA8650-10-C-7058, Albert Yu & Mary Bechmann Foundation, IBM, Lightspeed, Microsoft and Yahoo
引用论文
  • L. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25(3):211–230, 2003.
    Google ScholarLocate open access versionFindings
  • A. Agarwal and S. Chakrabarti. Learning random walks to rank nodes in graphs. In ICML ’07, pages 9–16, 2007.
    Google ScholarLocate open access versionFindings
  • A. Agarwal, S. Chakrabarti, and S. Aggarwal. Learning to rank networked entities. In KDD ’06, pages 14–23, 2006.
    Google ScholarLocate open access versionFindings
  • A. Andrew. Iterative computation of derivatives of eigenvalues and eigenvectors. IMA Journal of Applied Mathematics, 24(2):209–218, 1979.
    Google ScholarLocate open access versionFindings
  • A. L. Andrew. Convergence of an iterative method for derivatives of eigensystems. Journal of Computational Physics, 26:107–112, 1978.
    Google ScholarLocate open access versionFindings
  • L. Backstrom, D. P. Huttenlocher, J. M. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD ’06, pages 44–54, 2006.
    Google ScholarLocate open access versionFindings
  • A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.
    Google ScholarLocate open access versionFindings
  • A. Blum, H. Chan, and M. Rwebangira. A random-surfer web-graph model. In ANALCO ’06, 2006.
    Google ScholarLocate open access versionFindings
  • A. Clauset, C. Moore, and M. E. J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191):98–101, May 2008.
    Google ScholarLocate open access versionFindings
  • J. Coleman. Social Capital in the Creation of Human Capital. The American Journal of Sociology, 94:S95–S120, 1988.
    Google ScholarLocate open access versionFindings
  • M. Diligenti, M. Gori, and M. Maggini. Learning web page scores by error back-propagation. In IJCAI ’05, 2005.
    Google ScholarLocate open access versionFindings
  • J. Gehrke, P. Ginsparg, and J. M. Kleinberg. Overview of the 2003 kdd cup. SIGKDD Explorations, 5(2):149–151, 2003.
    Google ScholarLocate open access versionFindings
  • M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and influence. In KDD ’10, 2010.
    Google ScholarLocate open access versionFindings
  • M. S. Granovetter. The strength of weak ties. American Journal of Sociology, 78:1360–1380, 1973.
    Google ScholarLocate open access versionFindings
  • T. H. Haveliwala. Topic-sensitive pagerank. In WWW ’02, pages 517–526, 2002.
    Google ScholarLocate open access versionFindings
  • K. Henderson and T. Eliassi-Rad. Applying latent dirichlet allocation to group discovery in large graphs. In SAC ’09, pages 1456–1461.
    Google ScholarLocate open access versionFindings
  • G. Jeh and J. Widom. Scaling personalized web search. In WWW ’03, pages 271–279, 2003.
    Google ScholarLocate open access versionFindings
  • R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic models for the web graph. In FOCS ’00, page 57, 2000.
    Google ScholarLocate open access versionFindings
  • J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. In KDD ’08, pages 462–470, 2008.
    Google ScholarLocate open access versionFindings
  • J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD ’05, pages 177–187, 2005.
    Google ScholarLocate open access versionFindings
  • D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM ’03, pages 556–559, 2003.
    Google ScholarLocate open access versionFindings
  • D. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical Programming, 45:503–528, 1989. 10.1007/BF01589116.
    Locate open access versionFindings
  • R. Minkov and W. W. Cohen. Learning to rank typed graph walks: Local and global approaches. In WebKDD/SNA-KDD ’07, pages 1–8, 2007.
    Google ScholarLocate open access versionFindings
  • S. Myers and J. Leskovec. On the convexity of latent social network inference. In NIPS ’10, 2010.
    Google ScholarLocate open access versionFindings
  • L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Dig. Lib. Tech. Proj., 1998.
    Google ScholarFindings
  • A. Popescul, R. Popescul, and L. H. Ungar. Statistical relational learning for link prediction, 2003.
    Google ScholarFindings
  • P. Sarkar and A. W. Moore. Fast dynamic reranking in large graphs. In WWW ’09, pages 31–40, 2009.
    Google ScholarLocate open access versionFindings
  • B. Taskar, M. F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. In NIPS ’03, 2003.
    Google ScholarLocate open access versionFindings
  • H. Tong and C. Faloutsos. Center-piece subgraphs: problem definition and fast solutions. In KDD ’06, pages 404–413, 2006.
    Google ScholarLocate open access versionFindings
  • H. Tong, C. Faloutsos, and Y. Koren. Fast direction-aware proximity for graph mining. In KDD ’07, pages 747–756, 2007.
    Google ScholarLocate open access versionFindings
  • T. Tong, C. Faloutsos, and J.-Y. Pan. Fast randomwalk with restart and its applications. In ICDM ’06, 2006.
    Google ScholarLocate open access versionFindings
  • L. Yan, R. Dodier, M. Mozer, and R. Wolniewicz. Optimizing classifier performance via an approximation to the wilcoxon-mann-whitney statistic. In ICML ’03, pages 848–855, 2003.
    Google ScholarLocate open access versionFindings
下载 PDF 全文
您的评分 :
0

 

标签
评论