AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a collective matrix factorization model: we simultaneously factor several matrices, sharing parameters among factors when an entity participates in multiple relations

Relational learning via collective matrix factorization

KDD, pp.650-658, (2008)

Cited by: 355|Views191
EI

Abstract

Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations encode users' ratings of movies, movies' genres, and acto...More

Code:

Data:

0
Introduction
  • Relational data consists of entities and relations between them. In many cases, such as relational databases, the number of entity types and relation types are fixed.
  • One model of Bregman matrix factorization [17] proposes the following decomposable loss function for X ≈ f1(U V T ): L1(U, V |W ) = DF1 (U V T || X, W ) + DG(0 || U ) + DH (0 || V ), where G(u) = λu2/2 and H(v) = γv2/2 for λ, γ > 0 corresponds to 2 regularization.
Highlights
  • Relational data consists of entities and relations between them
  • We demonstrate that a general approach to collective matrix factorization can work efficiently on large, sparse data sets with relational schemas and nonlinear link functions
  • If the prediction link and loss correspond to a Bernoulli distribution, margin losses are special cases of biases; methods based on plate models, such as pLSI [19], can be placed in our framework just as well as methods that factor data matrices. While these features can be added to collective matrix factorization, we focus primarily on relational issues
  • If we use a Hinge loss for each of these binary predictions and add the losses together, the result is equivalent to a collective matrix factorization where E1 are users, E2 are movies, and E1 ∼u E2 for u = 1
  • We provide an example where the additional flexibility of collective matrix factorization leads to better results; and another where a co-clustering model, pLSI-pHITS, has the advantage
  • We present a unified view of matrix factorization, building on it to provide collective matrix factorization as a model of pairwise relational data
Results
  • The authors distinguish the work from prior methods on three points: (i) competing methods often impose a clustering constraint, whereas the authors cover both cluster and factor analysis; the stochastic Newton method lets them handle large, sparsely observed relations by taking advantage of decomposability of the loss; and the presentation is more general, covering a wider variety of models, schemas, and losses.
  • For, the model emphasizes that there is little difference between factoring two matrices versus three or more; and, the optimization procedure can use any twice differentiable decomposable loss, including the important class of Bregman divergences.
  • If the authors use a Hinge loss for each of these binary predictions and add the losses together, the result is equivalent to a collective matrix factorization where E1 are users, E2 are movies, and E1 ∼u E2 for u = 1 .
  • The dense rating scenario, Figure 1, shows that collective matrix factorization improves both prediction tasks: whether a user rated a movie, and which genres a movie belongs to.
  • On a three factor problem with n1 = 100000 users, n2 = 5000 movies, and n3 = 21 genres, with over 1.3M observed ratings, alternating projection with full Newton steps runs to convergence in 32 minutes on a single 1.6 GHz CPU.
  • The authors provide an example where the additional flexibility of collective matrix factorization leads to better results; and another where a co-clustering model, pLSI-pHITS, has the advantage.
  • Since pLSI-pHITS is a co-clustering method, and the collective matrix factorization model is a link prediction method, the authors choose a measure that favours neither inherently: ranking.
Conclusion
  • The authors compare four different models for generating rankings of movies for users: CMF-Identity: Collective matrix factorization using identity prediction links, f1(θ) = f2(θ) = θ and squared loss.
  • The authors present a novel application of stochastic approximation to collective matrix factorization, which allows one handle even larger matrices using a sampled approximation to the gradient and Hessian, with provable convergence and a fast rate of convergence in practice.
Related work
  • Collective matrix factorization provides a unified view of matrix factorization for relational data: different methods correspond to different distributional assumptions on individual matrices, different schemas tying factors together, and different optimization procedures. We distinguish our work from prior methods on three points: (i) competing methods often impose a clustering constraint, whereas we cover both cluster and factor analysis (although our experiments focus on factor analysis); (ii) our stochastic Newton method lets us handle large, sparsely observed relations by taking advantage of decomposability of the loss; and (iii) our presentation is more general, covering a wider variety of models, schemas, and losses. In particular, for (iii), our model emphasizes that there is little difference between factoring two matrices versus three or more; and, our optimization procedure can use any twice differentiable decomposable loss, including the important class of Bregman divergences. For example, if we restrict our model to a single relation E1 ∼ E2, we can recover all of the single-matrix models mentioned in Sec. 2.2. While our alternating projections approach is conceptually simple, and allows one to take advantage of decomposability, there is a panoply of alternatives for factoring a single matrix. The more popular ones includes majorization [22], which iteratively minimize a sequence of convex upper bounding functions tangent to the objective, including the multiplicative update for NMF [21] and the EM algorithm, which is used both for pLSI [19] and weighted SVD [32]. Direct optimization solves the non-convex problem with respect to (U, V ) using gradient or second-order methods, such as the fast variant of maxmargin matrix factorization [30].
Funding
  • This research was funded in part by a grant from DARPA’s RADAR program
Reference
  • D. Agarwal and S. Merugu. Predictive discrete latent factor models for large scale dyadic data. In KDD, pages 26–35, 2007.
    Google ScholarLocate open access versionFindings
  • D. J. Aldous. Representations for partially exchangeable arrays of random variables. J. Multi. Anal., 11(4):581–598, 1981.
    Google ScholarLocate open access versionFindings
  • [4] K. S. Azoury and M. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. Mach. Learn., 43:211–246, 2001.
    Google ScholarLocate open access versionFindings
  • [5] A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In SDM. SIAM, 2007.
    Google ScholarLocate open access versionFindings
  • [6] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences. J. Mach. Learn. Res., 6:1705–1749, 2005.
    Google ScholarLocate open access versionFindings
  • [7] L. Bottou. Online algorithms and stochastic approximations. In Online Learning and Neural Networks. Cambridge UP, 1998.
    Google ScholarFindings
  • [8] L. Bottou and Y. LeCun. Large scale online learning. In NIPS, 2003.
    Google ScholarFindings
  • [9] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge UP, 2004.
    Google ScholarFindings
  • [10] L. Bregman. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math and Math. Phys., 7:200–217, 1967.
    Google ScholarLocate open access versionFindings
  • [11] Y. Censor and S. A. Zenios. Parallel Optimization: Theory, Algorithms, and Applications. Oxford UP, 1997.
    Google ScholarFindings
  • [12] P. P. Chen. The entity-relationship model: Toward a unified view of data. ACM Trans. Data. Sys., 1(1):9–36, 1976.
    Google ScholarLocate open access versionFindings
  • [13] D. Cohn and T. Hofmann. The missing link–a probabilistic model of document content and hypertext connectivity. In NIPS, 2000.
    Google ScholarLocate open access versionFindings
  • [14] M. Collins, S. Dasgupta, and R. E. Schapire. A generalization of principal component analysis to the exponential family. In NIPS, 2001.
    Google ScholarLocate open access versionFindings
  • [15] J. Forster and M. K. Warmuth. Relative expected instantaneous loss bounds. In COLT, pages 90–99, 2000.
    Google ScholarLocate open access versionFindings
  • [16] G. H. Golub and C. F. V. Loan. Matrix Computions. John Hopkins UP, 3rd edition, 1996.
    Google ScholarFindings
  • [17] G. J. Gordon. Generalized2 linear2 models. In NIPS, 2002.
    Google ScholarLocate open access versionFindings
  • [18] D. Harman. Overview of the 2nd text retrieval conference (TREC-2). Inf. Process. Manag., 31(3):271–289, 1995.
    Google ScholarLocate open access versionFindings
  • [19] T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50–57, 1999.
    Google ScholarLocate open access versionFindings
  • [20] Internet Movie Database Inc. IMDB interfaces. http://www.imdb.com/interfaces, Jan.2007.
    Findings
  • [21] D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, 2001.
    Google ScholarLocate open access versionFindings
  • [22] J. D. Leeuw. Block relaxation algorithms in statistics, 1994.
    Google ScholarLocate open access versionFindings
  • [23] B. Long, Z. M. Zhang, X. Wu;, and P. S. Yu. Spectral clustering for multi-type relational data. In ICML, pages 585–592, 2006.
    Google ScholarLocate open access versionFindings
  • [24] B. Long, Z. M. Zhang, X. Wu, and P. S. Yu. Relational clustering by symmetric convex coding. In ICML, pages 569–576, 2007.
    Google ScholarLocate open access versionFindings
  • [25] B. Long, Z. M. Zhang, and P. S. Yu. A probabilistic framework for relational clustering. In KDD, pages 470–479, 2007.
    Google ScholarLocate open access versionFindings
  • [26] P. McCullagh and J. Nelder. Generalized Linear Models. Chapman and Hall: London., 1989.
    Google ScholarFindings
  • [27] Netflix. Netflix prize dataset. http://www.netflixprize.com, Jan.2007.
    Findings
  • [28] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999.
    Google ScholarLocate open access versionFindings
  • [29] F. Pereira and G. Gordon. The support vector decomposition machine. In ICML, pages 689–696, 2006.
    Google ScholarLocate open access versionFindings
  • [30] J. D. M. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. In ICML, pages 713–719, 2005.
    Google ScholarLocate open access versionFindings
  • [31] A. P. Singh and G. J. Gordon. Relational learning via collective matrix factorization. Technical Report CMU-ML-08-109, Machine Learning Department, Carnegie Mellon University, 2008.
    Google ScholarFindings
  • [32] N. Srebro and T. Jaakola. Weighted low-rank approximations. In ICML, 2003.
    Google ScholarFindings
  • [33] N. Srebro, J. D. Rennie, and T. S. Jaakkola. Maximum-margin matrix factorization. In NIPS, 2004.
    Google ScholarLocate open access versionFindings
  • [34] P. Stoica and Y. Selen. Cyclic minimizers, majorization techniques, and the expectation-maximization algorithm: a refresher. Sig. Process. Mag., IEEE, 21(1):112–114, 2004.
    Google ScholarLocate open access versionFindings
  • [35] K. Yu, S. Yu, and V. Tresp. Multi-label informed latent semantic indexing. In SIGIR, pages 258–265, 2005.
    Google ScholarLocate open access versionFindings
  • [36] S. Yu, K. Yu, V. Tresp, H.-P. Kriegel, and M. Wu. Supervised probabilistic principal component analysis. In KDD, pages 464–473, 2006.
    Google ScholarLocate open access versionFindings
  • [37] S. Zhu, K. Yu, Y. Chi, and Y. Gong. Combining content and link for classification using matrix factorization. In SIGIR, pages 487–494, 2007.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科