Our model is able to capture the sentiment in each aspect of a review, and predict partial scores under different aspects
Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS)
KDD, pp.193-202, (2014)
Recommendation and review sites offer a wealth of information beyond ratings. For instance, on IMDb users leave reviews, commenting on different aspects of a movie (e.g. actors, plot, visual effects), and expressing their sentiments (positive or negative) on these aspects in their reviews. This suggests that uncovering aspects and sentime...更多
下载 PDF 全文
- Collaborative filtering is a staple to many business in the internet economy. Data to build good content recommender systems essentially comes in three guises: interactions, ratings, and reviews.
- There is rating information regarding whether the user enjoyed the recommended item.
- This is the traditional domain of collaborative filtering.
- The top three background words are ‘film’, ‘story’, and ’character’, all of which provide little information about aspects or sentiments.
- This is not a mistake, as the word ‘nasty’ can convey positive or negative connotations for different users at the same time
- Collaborative filtering is a staple to many business in the internet economy
- Data to build good content recommender systems essentially comes in three guises: interactions, ratings, and reviews
- Our model outperforms state-of-the-art recommender systems such as matrix factorization 
- Aspect-sentiments contain sentiment words specific to aspects, e.g. “spectacular” of “Adventure” aspect, “sharp” of “Social” aspect, and “nasty” of “Violence” aspect. These words emphasize the importance of discriminating sentiment words for different aspects
- Our model is able to capture the sentiment in each aspect of a review, and predict partial scores under different aspects
- The authors' model outperforms state-of-the-art recommender systems such as matrix factorization .
- As is common in collaborative filtering, only a tiny fraction of matrix entries are present — the dataset contained less than 0.03% observed entries.
- The authors' model outperforms state-of-the-art methods in terms of MSE on recommendation.
- The authors' model achieves the best performance in terms of different factor size when the size of aspect is 20
- In this paper the authors proposed JMARS which provides superior recommendations by exploiting all the available data sources.
- Towards this end, the authors involve information from review and ratings.
- The user interests and movie topics can be inferred with the integrated model.
- Future work includes capturing the hierarchical nature of movie topics and incorporating non-parametric models to increase flexibility.
- A fast inference algorithm is required to further increase the scalability of this model
- Table1: IMDb data set. Unigrams containing stop words or punctuations, as well as infrequent unigrams that appear less than five times in the corpus are removed during pruning
- Table2: Comparison of models in terms of perplexity on held-out data in terms of different topic and latent factor size
- Table3: Comparison of models in terms of MSE on held-out data. † and ‡ mean the result is better than the method in the previous columns at 1% and 0.1% significance level, measured by McNemar’s test
- Table4: The learnt aspect-specific ratings and latent sentiment identified by our model for a review
- Table5: Top background words from φ0 and sentiment words from φs
- Table6: Top topic words from φa for three topics measure by aggregating all θu,m across reviews. The aspect labels (adventure, violence, social) are manually assigned
- Table7: Top movie-specific words from φm
- Collaborative filtering is a fertile area of research and there exists a multitude of techniques which can readily be applied to subsets of the problem that we tackle. See e.g. [18, 9] for a review. Specifically, probabilistic matrix factorization methods [15, 17] have proven successful in real world problems [3, 8, 11, 25, 22].
However, probabilistic matrix factorization techniques struggle to generalize to new items, i.e. they fail at the cold-start problem. Regression based latent factor models (RLFM)  use attribute features to solve this problem by incorporating observable features into latent factors. Recent research [22, 16] incorporates Latent Dirichlet Allocation (LDA) and uses the topic as features, e.g. for recommending scientific articles. In terms of ratings,  use a statistically more appropriate model for capturing the discrete nature of the reviews by formulating an exponential families approach.
- This research is supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office, Media Development Authority (MDA)
- D. Agarwal and B.-C. Chen. Regression-based latent factor models. In J. Elder, F. Fogelman-Soulie, P. Flach, and M. Zaki, editors, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 19–28. ACM, 2009.
- A. Ahmed and E. P. Xing. Staying informed: supervised and semi-supervised multi-view topical analysis of ideological perspective. In Conference on Empirical Methods in Natural Language Processing, pages 1140–1150. ACL, 2010.
- R. M. Bell and Y. Koren. Lessons from the Netflix prize challenge. SIGKDD Explorations, 9(2):75–79, 2007.
- A. Z. Broder. Computational advertising and recommender systems. In P. Pu, D. G. Bridge, B. Mobasher, and F. Ricci, editors, Conference on Recommender Systems, pages 1–2. ACM, 2008.
- J.-F. Cai, E. J. Candes, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):1956–1982, 2010.
- T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101:5228–5235, 2004.
- L. Hong, A. Ahmed, S. Gurumurthy, A. Smola, and K. Tsioutsiouliklis. Discovering geographical topics in the twitter stream. In International Conference on World Wide Web, 2012. Aspect social moral society point question human god act nature issues men personal culture behavior conflict
-  Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30–37, 2009.
-  A. Lazaridou, I. Titov, and C. Sporleder. A bayesian model for joint unsupervised induction of sentiment, aspect and discourse representations. In Annual Meeting of the Association for Computational Linguistics, pages 1630–1639, 2013.
-  H. Ma, H. Yang, M. R. Lyu, and I. King. SoRec: Social Recommendation Using Probabilistic Matrix Factorization. In Conference on Information and Knowledge Management, pages 931–940, 2008.
-  J. McAuley and J. Leskovec. Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text. In Conference on Recommender Systems, pages 165–172, 2013.
-  J. J. McAuley, J. Leskovec, and D. Jurafsky. Learning attitudes and attributes from multi-aspect reviews. In International Conference on Data Mining, pages 1020–1025, 2012.
-  Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: Modeling facets and opinions in weblogs. In International Conference on World Wide Web, pages 171–180, 2007.
-  A. Mnih and R. Salakhutdinov. Probabilistic matrix factorization. In Neural Information Processing Systems Conference, pages 1257–1264, 2007.
-  I. Porteous, E. Bart, and M. Welling. Multi-HDP: A non parametric bayesian model for tensor factorization. In D. Fox and C. Gomes, editors, Conference on Artificial Intelligence, pg. 1487–1490. 2008.
-  R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using markov chain monte carlo. In W. Cohen, A. McCallum, and S. Roweis, editors, International Conference on Machine Learning, volume 307, pages 880–887. ACM, 2008.
-  X. Su and T. M. Khoshgoftaar. A survey of collaborative filtering techniques. Advances in Artificial Intelligence, 2009 4:2, Jan. 2009.
-  C. Tan, E. H. Chi, D. Huffaker, G. Kossinets, and A. J. Smola. Instant foodie: Predicting expert ratings from grassroots. In Conference on Information and Knowledge Management, 2013.
-  I. Titov and R. Mcdonald. A Joint Model of Text and Aspect Ratings for Sentiment Summarization. In Annual Meeting of the Association for Computational Linguistics, pages 308–316, Columbus, Ohio, 2008.
-  C. Wang and D. M. Blei. Collaborative Topic Modeling for Recommending Scientific Articles. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 448–456, 2011.
-  H. Wang, Y. Lu, and C. Zhai. Latent aspect rating analysis without aspect keyword supervision. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 618–626, 2011.
-  M. Weimer, A. Karatzoglou, Q. Le, and A. J. Smola. Cofi rank - maximum margin matrix factorization for collaborative ranking. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, 2008.
-  S.-H. Yang, B. Long, A. Smola, H. Zha, and Z. Zheng. Collaborative competitive filtering: learning recommender using context of user choice. In W.-Y. Ma, J.-Y. Nie, R. A. Baeza-Yates, T.-S. Chua, and W. B. Croft, editors, Research and Development in Information Retrieval, pages 295–304. ACM, 2011.
-  X. Zhao, J. Jiang, H. Yan, and X. Li. Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid. In Conference on Empirical Methods in Natural Language Processing, pages 56–65, 2010.