# Supervised random walks: predicting and recommending links in social networks

Proceedings of the fourth ACM international conference on Web search and data mining, pp. 635-644, 2011.

EI

微博一下：

摘要：

Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively stud...更多

代码：

数据：

简介

- Large real-world networks exhibit a range of interesting properties and patterns [7, 20].
- Research seeks to develop models that will accurately predict the global structure of the network [7, 20, 19, 6].
- Studying the networks at a level of individual edge creations is interesting and in some respects more difficult than global network modeling.
- Identifying the mechanisms by which such social networks evolve at the level of individual edges is a fundamental question that is still not well understood, and it forms the motivation for the work here

重点内容

- Large real-world networks exhibit a range of interesting properties and patterns [7, 20]
- We develop a concept of Supervised Random Walks that naturally and in a principled way combines the network structure with the characteristics of nodes and edges of the network into a unified link prediction algorithm
- We develop a method based on Supervised Random Walks that in a supervised way learns how to bias a PageRank-like random walk on the network [3, 2] so that it visits given nodes more often than the others
- Overall, Supervised Random Walks (SRW) give a significant improvement over the unweighted Random Walk with Restarts (RWR)
- In Facebook (Tab. 3), Random Walk with Restarts already gives near-optimal Area under the ROC curve, while Supervised Random Walks still obtain 11% relative improvement in Precision at Top 20
- We have proposed Supervised Random Walks, a new learning algorithm for link prediction and link recommendation

方法

- Random Walk with Restart Degree DT: Node features DT: Path features DT: All features LR: Node features LR: Path features LR: All features SRW: one edge type SRW: multiple edge types pendent datasets, one for training and one for testing.
- Note that much of the improvement in the curve comes in the area near the origin, corresponding to the nodes with the highest predicted values.
- For supervised machine learning methods the authors experiments with decision trees and logistic regression and group the features used for training them into three groups:

结果

- How well does the model perform in terms of the classification accuracy and second, whether it recovers the edge strength function parameters w∗ = [1, −1].
- In the deterministic case of creating D and with 0 noise added, the authors hope that the algorithm is able achieve near perfect classification.
- The authors expect the performance to drop, but even the authors hope that the recovered values of wwill be close to true w∗.

结论

- The authors have proposed Supervised Random Walks, a new learning algorithm for link prediction and link recommendation.
- The resulting predictions show large improvements over Random Walks with Restarts and compare favorably to supervised machine learning techniques that require tedious feature extraction and generation.
- Supervised Random Walks are not limited to link prediction, and can be applied to many other problems that require learning to rank nodes in a graph, like recommendations, anomaly detection, missing link, and expertise search and ranking

总结

## Introduction:

Large real-world networks exhibit a range of interesting properties and patterns [7, 20].- Research seeks to develop models that will accurately predict the global structure of the network [7, 20, 19, 6].
- Studying the networks at a level of individual edge creations is interesting and in some respects more difficult than global network modeling.
- Identifying the mechanisms by which such social networks evolve at the level of individual edges is a fundamental question that is still not well understood, and it forms the motivation for the work here
## Methods:

Random Walk with Restart Degree DT: Node features DT: Path features DT: All features LR: Node features LR: Path features LR: All features SRW: one edge type SRW: multiple edge types pendent datasets, one for training and one for testing.- Note that much of the improvement in the curve comes in the area near the origin, corresponding to the nodes with the highest predicted values.
- For supervised machine learning methods the authors experiments with decision trees and logistic regression and group the features used for training them into three groups:
## Results:

How well does the model perform in terms of the classification accuracy and second, whether it recovers the edge strength function parameters w∗ = [1, −1].- In the deterministic case of creating D and with 0 noise added, the authors hope that the algorithm is able achieve near perfect classification.
- The authors expect the performance to drop, but even the authors hope that the recovered values of wwill be close to true w∗.
## Conclusion:

The authors have proposed Supervised Random Walks, a new learning algorithm for link prediction and link recommendation.- The resulting predictions show large improvements over Random Walks with Restarts and compare favorably to supervised machine learning techniques that require tedious feature extraction and generation.
- Supervised Random Walks are not limited to link prediction, and can be applied to many other problems that require learning to rank nodes in a graph, like recommendations, anomaly detection, missing link, and expertise search and ranking

- Table1: Dataset statistics. N, E: number of nodes and edges in the full network, S: number of sources, C: avg. number of candidates per source, D : avg. number of destination nodes
- Table2: Hep-Ph co-authorship network. DT: decision tree, LR: logistic regression, and SRW: Supervised Random Walks
- Table3: Results for the Facebook dataset
- Table4: Results for all datasets. We compare favorably to logistic features as run on all features. Our Supervised Random Walks (SRW) perform significantly better than the baseline in all cases on ROC area. The variance is too high on the Top20 metric, and the two methods are statistically tied on this metric

基金

- Research was in-part supported by NSF CNS-1010921, NSF IIS-1016909, AFRL FA8650-10-C-7058, Albert Yu & Mary Bechmann Foundation, IBM, Lightspeed, Microsoft and Yahoo

引用论文

- L. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25(3):211–230, 2003.
- A. Agarwal and S. Chakrabarti. Learning random walks to rank nodes in graphs. In ICML ’07, pages 9–16, 2007.
- A. Agarwal, S. Chakrabarti, and S. Aggarwal. Learning to rank networked entities. In KDD ’06, pages 14–23, 2006.
- A. Andrew. Iterative computation of derivatives of eigenvalues and eigenvectors. IMA Journal of Applied Mathematics, 24(2):209–218, 1979.
- A. L. Andrew. Convergence of an iterative method for derivatives of eigensystems. Journal of Computational Physics, 26:107–112, 1978.
- L. Backstrom, D. P. Huttenlocher, J. M. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD ’06, pages 44–54, 2006.
- A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.
- A. Blum, H. Chan, and M. Rwebangira. A random-surfer web-graph model. In ANALCO ’06, 2006.
- A. Clauset, C. Moore, and M. E. J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191):98–101, May 2008.
- J. Coleman. Social Capital in the Creation of Human Capital. The American Journal of Sociology, 94:S95–S120, 1988.
- M. Diligenti, M. Gori, and M. Maggini. Learning web page scores by error back-propagation. In IJCAI ’05, 2005.
- J. Gehrke, P. Ginsparg, and J. M. Kleinberg. Overview of the 2003 kdd cup. SIGKDD Explorations, 5(2):149–151, 2003.
- M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and influence. In KDD ’10, 2010.
- M. S. Granovetter. The strength of weak ties. American Journal of Sociology, 78:1360–1380, 1973.
- T. H. Haveliwala. Topic-sensitive pagerank. In WWW ’02, pages 517–526, 2002.
- K. Henderson and T. Eliassi-Rad. Applying latent dirichlet allocation to group discovery in large graphs. In SAC ’09, pages 1456–1461.
- G. Jeh and J. Widom. Scaling personalized web search. In WWW ’03, pages 271–279, 2003.
- R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic models for the web graph. In FOCS ’00, page 57, 2000.
- J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. In KDD ’08, pages 462–470, 2008.
- J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD ’05, pages 177–187, 2005.
- D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM ’03, pages 556–559, 2003.
- D. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical Programming, 45:503–528, 1989. 10.1007/BF01589116.
- R. Minkov and W. W. Cohen. Learning to rank typed graph walks: Local and global approaches. In WebKDD/SNA-KDD ’07, pages 1–8, 2007.
- S. Myers and J. Leskovec. On the convexity of latent social network inference. In NIPS ’10, 2010.
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Dig. Lib. Tech. Proj., 1998.
- A. Popescul, R. Popescul, and L. H. Ungar. Statistical relational learning for link prediction, 2003.
- P. Sarkar and A. W. Moore. Fast dynamic reranking in large graphs. In WWW ’09, pages 31–40, 2009.
- B. Taskar, M. F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. In NIPS ’03, 2003.
- H. Tong and C. Faloutsos. Center-piece subgraphs: problem definition and fast solutions. In KDD ’06, pages 404–413, 2006.
- H. Tong, C. Faloutsos, and Y. Koren. Fast direction-aware proximity for graph mining. In KDD ’07, pages 747–756, 2007.
- T. Tong, C. Faloutsos, and J.-Y. Pan. Fast randomwalk with restart and its applications. In ICDM ’06, 2006.
- L. Yan, R. Dodier, M. Mozer, and R. Wolniewicz. Optimizing classifier performance via an approximation to the wilcoxon-mann-whitney statistic. In ICML ’03, pages 848–855, 2003.

下载 PDF 全文

标签

评论