SimClusters: Community-Based Representations for Heterogeneous Recommendations at Twitter

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event CA USA July, 2020, pp. 3183-3193, 2020.

Cited by: 1|Views77
EI
Weibo:
We proposed a framework called SimClusters based on detecting bipartite communities from the user-user graph and use them as a representation space to solve many personalization and recommendation problems at scale

Abstract:

Personalized recommendation products at Twitter target a multitude of heterogeneous items: Tweets, Events, Topics, Hashtags, and users. Each of these targets varies in their cardinality (which affects the scale of the problem) and their "shelf life'' (which constrains the latency of generating the recommendations). Although Twitter has bu...More

Code:

Data:

0
Introduction
  • Personalized recommendations lie at the heart of many different technology-enabled products, and Twitter is no exception.
  • The authors' highlevel goal is to make content discovery effortless and to free the user from the need for manual curation.
  • On the Twitter platform, a wide variety of content types are displayed in a multitude of contexts, requiring a variety of personalization approaches.
  • Recommendations of interesting Tweets are an essential component of the Home tab, and for dissemination via email or ∗.
Highlights
  • Personalized recommendations lie at the heart of many different technology-enabled products, and Twitter is no exception
  • (2) Communities of Right Nodes: We discover communities from this similarity graph, using a novel neighborhood-based sampling algorithm that is inspired by the work of [33] but is much more accurate, faster, and scales to graphs with billions of edges
  • In order to exactly track the row-wise and columnwise top-k views on W, it is necessary that we track the entirety of W - if it turns out that W is too big to be tracked in its entirety, one can use sketches to keep a summary of W at the cost of introducing errors [4, 15], we have found this unnecessary
  • We developed new SimClusters representations for users based on the user–user block graph, and used these representations as features to train a model for filtering out abusive and spammy replies
  • We proposed a framework called SimClusters based on detecting bipartite communities from the user-user graph and use them as a representation space to solve many personalization and recommendation problems at scale
  • We presented several diverse deployed and in-progress applications where we use SimClusters representations to improve relevance at Twitter
Methods
  • NNZ/row in U NNZ/row in V SimClusters NMF.
Results
  • Results on similarity graphs of

    Twitter users

    Neighborhood-Aware MH (Ours) BigClam Graclus | |

    | | Prec.
  • Results on similarity graphs of.
  • Neighborhood-Aware MH (Ours) BigClam Graclus | |.
  • | | Prec.
  • Rec. F1.
  • Time Prec.
  • Rec. F1 Time Prec.
  • Rec. F1 Time
Conclusion
  • The authors proposed a framework called SimClusters based on detecting bipartite communities from the user-user graph and use them as a representation space to solve many personalization and recommendation problems at scale.
  • SimClusters uses a novel algorithm called Neighborhood-aware MH for solving the crucial problem of unipartite community detection with better scalability and accuracy.
Summary
  • Introduction:

    Personalized recommendations lie at the heart of many different technology-enabled products, and Twitter is no exception.
  • The authors' highlevel goal is to make content discovery effortless and to free the user from the need for manual curation.
  • On the Twitter platform, a wide variety of content types are displayed in a multitude of contexts, requiring a variety of personalization approaches.
  • Recommendations of interesting Tweets are an essential component of the Home tab, and for dissemination via email or ∗.
  • Methods:

    NNZ/row in U NNZ/row in V SimClusters NMF.
  • Results:

    Results on similarity graphs of

    Twitter users

    Neighborhood-Aware MH (Ours) BigClam Graclus | |

    | | Prec.
  • Results on similarity graphs of.
  • Neighborhood-Aware MH (Ours) BigClam Graclus | |.
  • | | Prec.
  • Rec. F1.
  • Time Prec.
  • Rec. F1 Time Prec.
  • Rec. F1 Time
  • Conclusion:

    The authors proposed a framework called SimClusters based on detecting bipartite communities from the user-user graph and use them as a representation space to solve many personalization and recommendation problems at scale.
  • SimClusters uses a novel algorithm called Neighborhood-aware MH for solving the crucial problem of unipartite community detection with better scalability and accuracy.
Tables
  • Table1: A partial list of recommendations problems at Twitter along with the number of possible recommendable items, the shelf life of the recommendations, and where they are shown on Twitter
  • Table2: Epochs and time comparison for a synthetic graph with planted communities with 100 vertices and varying
  • Table3: Comparison with BigClam and Graclus for discovering communities from undirected graphs
  • Table4: Comparison against NMF for the usefulness of the learned U, V for link prediction
Download tables as Excel
Related work
  • Traditionally, approaches to recommender systems are categorized as either neighborhood-based (which do not involve model-fitting), or model-based (which fit a model to the input data).

    In our experience of building recommendations at Twitter, we find that neighborhood-based methods are easier to scale, more accurate, more interpretable, and also more flexible in terms of accommodating new users and/or items [9, 11, 12, 31]. Recent research has also found that well-tuned neighborhood-based methods are not easy to beat in terms of accuracy [6]. However, neighborhoodbased approaches do not provide a general solution – we needed to build and maintain separate systems to solve each recommendation sub-problems at Twitter in the past (see Section 1 for more discussion of our past work).

    Model-based approaches, such as factorized models [18], graph embedding [10, 26] or VAE [22], fit separate parameters for each user or item. The number of model parameters that need to be learned in order to scale to a billion-user social network can easily approach 1012, necessitating unprecedentedly large systems for solving ML problems at that scale. Hybrid models, such as Factorization Machine [27] and Deep Neural Networks (DNNs) [5] have been introduced to reduce the parameter space by utilizing the side information as prior knowledge for users and items. However, they
Reference
  • Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2008. Mixed Membership Stochastic Blockmodels. JMLR 9 (June 2008), 1981–2014.
    Google ScholarLocate open access versionFindings
  • Iván Cantador and Paolo Cremonesi. 2014. Tutorial on Cross-domain Recommender Systems. In RecSys ’14. 401–402.
    Google ScholarLocate open access versionFindings
  • Andrzej Cichocki and Anh-Huy Phan. 2009. Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations. IEICE Transactions 92-A (03 2009), 708–721.
    Google ScholarLocate open access versionFindings
  • Graham Cormode and Shan Muthukrishnan. 2005. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55, 1 (2005), 58–75.
    Google ScholarLocate open access versionFindings
  • Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In RecSys ’16. 191–198.
    Google ScholarLocate open access versionFindings
  • Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. 2019. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In Recsys’19. 101–109.
    Google ScholarLocate open access versionFindings
  • Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. 200Weighted Graph Cuts Without Eigenvectors A Multilevel Approach. IEEE Trans. Pattern Anal. Mach. Intell. 29, 11 (Nov. 2007), 1944–1957.
    Google ScholarLocate open access versionFindings
  • Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. 2015. A multi-view deep learning approach for cross domain user modeling in recommendation systems. In WWW’15. 278–288.
    Google ScholarLocate open access versionFindings
  • Ajeet Grewal, Jerry Jiang, Gary Lam, Tristan Jung, Lohith Vuddemarri, Quannan Li, Aaditya Landge, and Jimmy Lin. 2018. Recservice: Distributed Real-Time Graph Processing at Twitter. In HotCloud’18. USENIX Association, 3.
    Google ScholarLocate open access versionFindings
  • Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable Feature Learning for Networks. In KDD ’16. 855–864.
    Google ScholarLocate open access versionFindings
  • Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: The Who to Follow Service at Twitter. In WWW ’13. 505–514.
    Google ScholarLocate open access versionFindings
  • Pankaj Gupta, Venu Satuluri, Ajeet Grewal, Siva Gurumurthy, Volodymyr Zhabiuk, Quannan Li, and Jimmy Lin. 2014. Real-Time Twitter Recommendation: Online Motif Detection in Large Dynamic Graphs. Proceedings of the VLDB Endowment 7, 13 (2014), 1379–1380.
    Google ScholarLocate open access versionFindings
  • William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NIPS’17. 1025–1035.
    Google ScholarLocate open access versionFindings
  • Krishna Kamath, Aneesh Sharma, Dong Wang, and Zhijun Yin. 20Realgraph: User interaction prediction at twitter. In User Engagement Optimization Workshop at KDD’14.
    Google ScholarFindings
  • Richard M Karp, Scott Shenker, and Christos H Papadimitriou. 2003. A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS) 28, 1 (2003), 51–55.
    Google ScholarLocate open access versionFindings
  • Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In ICLR’17.
    Google ScholarLocate open access versionFindings
  • Jon M. Kleinberg. 1999. Authoritative Sources in a Hyperlinked Environment. J. ACM 46, 5 (Sept. 1999), 604–632.
    Google ScholarLocate open access versionFindings
  • Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42, 8 (Aug. 2009), 30–37.
    Google ScholarLocate open access versionFindings
  • Jérôme Kunegis. 2013. KONECT – The Koblenz Network Collection. In Proc. Int. Conf. on World Wide Web Companion. 1343–1350.
    Google ScholarLocate open access versionFindings
  • R. Lempel and S. Moran. 2001. SALSA: The Stochastic Approach for LinkStructure Analysis. ACM Trans. Inf. Syst. 19, 2 (April 2001), 131–160.
    Google ScholarLocate open access versionFindings
  • Jure Leskovec and Rok Sosič. 2016. SNAP: A General-Purpose Network Analysis and Graph-Mining Library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1.
    Google ScholarLocate open access versionFindings
  • Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018. Variational Autoencoders for Collaborative Filtering. In WWW ’18. 689–698.
    Google ScholarLocate open access versionFindings
  • David Melamed. 2014. Community Structures in Bipartite Networks: A DualProjection Approach. PLOS ONE 9, 5 (05 2014), 1–5.
    Google ScholarLocate open access versionFindings
  • Feng Niu, Benjamin Recht, Christopher Re, and Stephen J. Wright. 2011. HOG-
    Google ScholarFindings
  • F. et. al. Pedregosa. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
    Google ScholarLocate open access versionFindings
  • Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD’14. 701–710.
    Google ScholarLocate open access versionFindings
  • Steffen Rendle. 2010. Factorization machines. In ICDM’10. IEEE, 995–1000.
    Google ScholarLocate open access versionFindings
  • Venu Satuluri and Srinivasan Parthasarathy. 2011. Symmetrizations for Clustering Directed Graphs. In EDBT/ICDT ’11. 343–354.
    Google ScholarLocate open access versionFindings
  • Venu Satuluri, Srinivasan Parthasarathy, and Yiye Ruan. 2011. Local Graph Sparsification for Scalable Clustering. In SIGMOD ’11. 721–732.
    Google ScholarLocate open access versionFindings
  • Sebastian Schelter, Venu Satuluri, and Reza Bosagh Zadeh. 2014. Factorbird - a Parameter Server Approach to Distributed Matrix Factorization. ArXiv abs/1411.0602 (2014).
    Findings
  • Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin. 2016. GraphJet: Real-time Content Recommendations at Twitter. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1281–1292.
    Google ScholarLocate open access versionFindings
  • Aneesh Sharma, C. Seshadhri, and Ashish Goel. 2017. When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors. In WWW ’17. 431– 440.
    Google ScholarLocate open access versionFindings
  • Charalampos Tsourakakis. 2015. Provably Fast Inference of Latent Features from Networks: With Applications to Learning Social Circles and Multilabel Classification. In WWW’15. 1111–1121.
    Google ScholarLocate open access versionFindings
  • Jaewon Yang and Jure Leskovec. 2013. Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach. In WSDM’13. 587–596.
    Google ScholarLocate open access versionFindings
  • Jaewon Yang, Julian McAuley, and Jure Leskovec. 2014. Detecting Cohesive and 2-Mode Communities Indirected and Undirected Networks. In WSDM’14. 323–332.
    Google ScholarLocate open access versionFindings
  • Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed Chi. 2019. Sampling-bias-corrected neural modeling for large corpus item recommendations. In Recsys’19. 269–277.
    Google ScholarLocate open access versionFindings
  • Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In KDD ’18. 974–983.
    Google ScholarLocate open access versionFindings
  • Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation: A heterogeneous information network approach. In WSDM’14. 283–292.
    Google ScholarLocate open access versionFindings
  • Yongfeng Zhang, Qingyao Ai, Xu Chen, and W Bruce Croft. 2017. Joint representation learning for top-n recommendation with heterogeneous information sources. In CIKM’17. 1449–1458. The code for Neighborhood-aware MH and an in-memory implementation of Stage 1 are open-sourced in https://github.com/twitter/sbf.
    Findings
Your rating :
0

 

Tags
Comments