## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Relational learning via collective matrix factorization

KDD, pp.650-658, (2008)

EI

Keywords

Abstract

Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations encode users' ratings of movies, movies' genres, and acto...More

Code:

Data:

Introduction

- Relational data consists of entities and relations between them. In many cases, such as relational databases, the number of entity types and relation types are fixed.
- One model of Bregman matrix factorization [17] proposes the following decomposable loss function for X ≈ f1(U V T ): L1(U, V |W ) = DF1 (U V T || X, W ) + DG(0 || U ) + DH (0 || V ), where G(u) = λu2/2 and H(v) = γv2/2 for λ, γ > 0 corresponds to 2 regularization.

Highlights

- Relational data consists of entities and relations between them
- We demonstrate that a general approach to collective matrix factorization can work efficiently on large, sparse data sets with relational schemas and nonlinear link functions
- If the prediction link and loss correspond to a Bernoulli distribution, margin losses are special cases of biases; methods based on plate models, such as pLSI [19], can be placed in our framework just as well as methods that factor data matrices. While these features can be added to collective matrix factorization, we focus primarily on relational issues
- If we use a Hinge loss for each of these binary predictions and add the losses together, the result is equivalent to a collective matrix factorization where E1 are users, E2 are movies, and E1 ∼u E2 for u = 1
- We provide an example where the additional flexibility of collective matrix factorization leads to better results; and another where a co-clustering model, pLSI-pHITS, has the advantage
- We present a unified view of matrix factorization, building on it to provide collective matrix factorization as a model of pairwise relational data

Results

- The authors distinguish the work from prior methods on three points: (i) competing methods often impose a clustering constraint, whereas the authors cover both cluster and factor analysis; the stochastic Newton method lets them handle large, sparsely observed relations by taking advantage of decomposability of the loss; and the presentation is more general, covering a wider variety of models, schemas, and losses.
- For, the model emphasizes that there is little difference between factoring two matrices versus three or more; and, the optimization procedure can use any twice differentiable decomposable loss, including the important class of Bregman divergences.
- If the authors use a Hinge loss for each of these binary predictions and add the losses together, the result is equivalent to a collective matrix factorization where E1 are users, E2 are movies, and E1 ∼u E2 for u = 1 .
- The dense rating scenario, Figure 1, shows that collective matrix factorization improves both prediction tasks: whether a user rated a movie, and which genres a movie belongs to.
- On a three factor problem with n1 = 100000 users, n2 = 5000 movies, and n3 = 21 genres, with over 1.3M observed ratings, alternating projection with full Newton steps runs to convergence in 32 minutes on a single 1.6 GHz CPU.
- The authors provide an example where the additional flexibility of collective matrix factorization leads to better results; and another where a co-clustering model, pLSI-pHITS, has the advantage.
- Since pLSI-pHITS is a co-clustering method, and the collective matrix factorization model is a link prediction method, the authors choose a measure that favours neither inherently: ranking.

Conclusion

- The authors compare four different models for generating rankings of movies for users: CMF-Identity: Collective matrix factorization using identity prediction links, f1(θ) = f2(θ) = θ and squared loss.
- The authors present a novel application of stochastic approximation to collective matrix factorization, which allows one handle even larger matrices using a sampled approximation to the gradient and Hessian, with provable convergence and a fast rate of convergence in practice.

Related work

- Collective matrix factorization provides a unified view of matrix factorization for relational data: different methods correspond to different distributional assumptions on individual matrices, different schemas tying factors together, and different optimization procedures. We distinguish our work from prior methods on three points: (i) competing methods often impose a clustering constraint, whereas we cover both cluster and factor analysis (although our experiments focus on factor analysis); (ii) our stochastic Newton method lets us handle large, sparsely observed relations by taking advantage of decomposability of the loss; and (iii) our presentation is more general, covering a wider variety of models, schemas, and losses. In particular, for (iii), our model emphasizes that there is little difference between factoring two matrices versus three or more; and, our optimization procedure can use any twice differentiable decomposable loss, including the important class of Bregman divergences. For example, if we restrict our model to a single relation E1 ∼ E2, we can recover all of the single-matrix models mentioned in Sec. 2.2. While our alternating projections approach is conceptually simple, and allows one to take advantage of decomposability, there is a panoply of alternatives for factoring a single matrix. The more popular ones includes majorization [22], which iteratively minimize a sequence of convex upper bounding functions tangent to the objective, including the multiplicative update for NMF [21] and the EM algorithm, which is used both for pLSI [19] and weighted SVD [32]. Direct optimization solves the non-convex problem with respect to (U, V ) using gradient or second-order methods, such as the fast variant of maxmargin matrix factorization [30].

Funding

- This research was funded in part by a grant from DARPA’s RADAR program

Reference

- D. Agarwal and S. Merugu. Predictive discrete latent factor models for large scale dyadic data. In KDD, pages 26–35, 2007.
- D. J. Aldous. Representations for partially exchangeable arrays of random variables. J. Multi. Anal., 11(4):581–598, 1981.
- [4] K. S. Azoury and M. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. Mach. Learn., 43:211–246, 2001.
- [5] A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In SDM. SIAM, 2007.
- [6] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences. J. Mach. Learn. Res., 6:1705–1749, 2005.
- [7] L. Bottou. Online algorithms and stochastic approximations. In Online Learning and Neural Networks. Cambridge UP, 1998.
- [8] L. Bottou and Y. LeCun. Large scale online learning. In NIPS, 2003.
- [9] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge UP, 2004.
- [10] L. Bregman. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math and Math. Phys., 7:200–217, 1967.
- [11] Y. Censor and S. A. Zenios. Parallel Optimization: Theory, Algorithms, and Applications. Oxford UP, 1997.
- [12] P. P. Chen. The entity-relationship model: Toward a unified view of data. ACM Trans. Data. Sys., 1(1):9–36, 1976.
- [13] D. Cohn and T. Hofmann. The missing link–a probabilistic model of document content and hypertext connectivity. In NIPS, 2000.
- [14] M. Collins, S. Dasgupta, and R. E. Schapire. A generalization of principal component analysis to the exponential family. In NIPS, 2001.
- [15] J. Forster and M. K. Warmuth. Relative expected instantaneous loss bounds. In COLT, pages 90–99, 2000.
- [16] G. H. Golub and C. F. V. Loan. Matrix Computions. John Hopkins UP, 3rd edition, 1996.
- [17] G. J. Gordon. Generalized2 linear2 models. In NIPS, 2002.
- [18] D. Harman. Overview of the 2nd text retrieval conference (TREC-2). Inf. Process. Manag., 31(3):271–289, 1995.
- [19] T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50–57, 1999.
- [20] Internet Movie Database Inc. IMDB interfaces. http://www.imdb.com/interfaces, Jan.2007.
- [21] D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, 2001.
- [22] J. D. Leeuw. Block relaxation algorithms in statistics, 1994.
- [23] B. Long, Z. M. Zhang, X. Wu;, and P. S. Yu. Spectral clustering for multi-type relational data. In ICML, pages 585–592, 2006.
- [24] B. Long, Z. M. Zhang, X. Wu, and P. S. Yu. Relational clustering by symmetric convex coding. In ICML, pages 569–576, 2007.
- [25] B. Long, Z. M. Zhang, and P. S. Yu. A probabilistic framework for relational clustering. In KDD, pages 470–479, 2007.
- [26] P. McCullagh and J. Nelder. Generalized Linear Models. Chapman and Hall: London., 1989.
- [27] Netflix. Netflix prize dataset. http://www.netflixprize.com, Jan.2007.
- [28] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999.
- [29] F. Pereira and G. Gordon. The support vector decomposition machine. In ICML, pages 689–696, 2006.
- [30] J. D. M. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. In ICML, pages 713–719, 2005.
- [31] A. P. Singh and G. J. Gordon. Relational learning via collective matrix factorization. Technical Report CMU-ML-08-109, Machine Learning Department, Carnegie Mellon University, 2008.
- [32] N. Srebro and T. Jaakola. Weighted low-rank approximations. In ICML, 2003.
- [33] N. Srebro, J. D. Rennie, and T. S. Jaakkola. Maximum-margin matrix factorization. In NIPS, 2004.
- [34] P. Stoica and Y. Selen. Cyclic minimizers, majorization techniques, and the expectation-maximization algorithm: a refresher. Sig. Process. Mag., IEEE, 21(1):112–114, 2004.
- [35] K. Yu, S. Yu, and V. Tresp. Multi-label informed latent semantic indexing. In SIGIR, pages 258–265, 2005.
- [36] S. Yu, K. Yu, V. Tresp, H.-P. Kriegel, and M. Wu. Supervised probabilistic principal component analysis. In KDD, pages 464–473, 2006.
- [37] S. Zhu, K. Yu, Y. Chi, and Y. Gong. Combining content and link for classification using matrix factorization. In SIGIR, pages 487–494, 2007.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn