AI帮你理解科学
AI 精读
AI抽取本论文的概要总结
微博一下:
Learning latent representations of nodes for classifying in heterogeneous social networks
WSDM, pp.373-382, (2014)
EI
关键词
摘要
Social networks are heterogeneous systems composed of different types of nodes (e.g. users, content, groups, etc.) and relations (e.g. social or similarity relations). While learning and performing inference on homogeneous networks have motivated a large amount of research, few work exists on heterogeneous networks and there are open and ...更多
代码:
数据:
简介
- Social Media on the Web are most often complex heterogeneous networks with nodes and relations between nodes of different types, corresponding to different objects, concepts and relationships.
- Many of them rely on the idea of mapping an heterogeneous network onto an homogeneous network so that classical relational techniques are used [8, 10, 14, 2]
- They do not fully exploit the correlations between the different node labels or characteristics.
- The authors consider a transductive context where the network is composed of < N labeled nodes (x1, . . . , x ) with ∀i ∈ {1 . . . }, yi ∈ RCti , and N − l unlabeled nodes.
重点内容
- Social Media on the Web are most often complex heterogeneous networks with nodes and relations between nodes of different types, corresponding to different objects, concepts and relationships
- We consider the task of node classification in heterogeneous networks composed of different types of nodes, each node type beging associated with its own set of labels
- Much work has been devoted to classification for homogeneous networks composed of a single node type, e.g. [3, 17, 19, 27] to cite a few
- We have proposed a new model able to label nodes of heterogeneous networks where the nodes are of different types, each type corresponding to a particular set of possible categories
- Our experiments on three datasets show that the proposed model outperforms classical approaches, and qualitative analysis show that the proposed method effectively captures the inter-dependencies between the labels of different types of nodes
- Different extensions of this model are currently being investigated: the first one is to deal with multi-relational heterogeneous networks – i.e. heterogeneous networks where two nodes can be connected with more than one relation – and dynamic heterogeneous networks
结论
- The authors have proposed a new model able to label nodes of heterogeneous networks where the nodes are of different types, each type corresponding to a particular set of possible categories.
- The authors' experiments on three datasets show that the proposed model outperforms classical approaches, and qualitative analysis show that the proposed method effectively captures the inter-dependencies between the labels of different types of nodes
- Different extensions of this model are currently being investigated: the first one is to deal with multi-relational heterogeneous networks – i.e. heterogeneous networks where two nodes can be connected with more than one relation – and dynamic heterogeneous networks.
- The authors are working on an extension which allows one to learn sparse representation of nodes in the graph
表格
- Table1: Statistics on the three datasets
- Table2: Accuracy over the DBLP datasets with a latent space of size 30
- Table3: P@1 over the Flickr datasets with a latent space of size 200 noted by Mapping to Homogeneous Model (MTH), uses multiple homogeneous graphs, one for each node type, to represent the heterogeneous network. For each homogeneous problem, we use the model proposed in [<a class="ref-link" id="c6" href="#r6">6</a>] that minimizes the loss in Equation 1. It does not make use of the correlations between labels of nodes types
- Table4: a) Accuracy on DBLP No Content depending on the representation size Z
- Table5: P@1 and P@k on Flickr depending on the representation size Z w.r.t MTH and ULS
相关工作
- Graph node classification has motivated a lot of work during the last decade and different models have been proposed. Two main families of models can be distinguished: (i) Collective classification techniques are extensions of inductive learning to relational data. They consider both node attributes, labels and their dependencies. The classification problem is formulated as an optimal assignment of labels to the vertices of a graph. Since exact algorithms cannot be used in general for this combinatorial problem, approximate iterative algorithms have been developed. Sen et al [19] provide a general introduction and a comparison of these models. They distinguish between local and global models. The former such as Iterative Classification [19] and its variants like SICA [18], Gibbs Sampling [17], or Stacked Learning [15], make use of local classifiers taking as input the node attributes and statistics on the neighbors labels. The latter attempt to optimize a global function using graphical models, e.g. Markov Random Fields trained for example using loopy belief propagation. In practice, they advocate the use of simple local models which offer similar performance as more complex graphical models and do not suffer from convergence problems exhibited by the latter. All these methods have been proposed for homogeneous networks whereas heterogeneous classification was explicitly mentioned as an open problem. (ii) The second family of models consists of semi-supervised and transductive regularized models – [28], [3] and [27] – which are based on the minimization of an objective function that encourages connected nodes to have the same labels. This family of models has been initially proposed for pure relational classification (no content associated to nodes) and for homogeneous networks. It has been extensively used in many different contexts. Extensions for handling node content information have been developed with applications to social network labeling [6] and Web-spam detection [1]. Several extensions have been proposed for dealing with more complex networks. For example, algorithms have been developed for multi-relational graphs, where nodes are all of the same type, but can be connected through multiple relations which will have different influences on the label propagation. These methods can learn from the data the weights of the different relations, and the multi-graph is then reduced to a simple graph by combining the different types of relations –[25, 12, 9]. Note that besides node classification, other methods have been proposed for other tasks like linkprediction [7] for example Mining heterogeneous networks is a much more recent domain and different tasks have been addressed: classification of nodes [11, 10, 8, 2], link prediction [5],[24], influence analysis [16], clustering [20], entity similarity search [26], [21]. We will concentrate here on classification which has been addressed in a few papers.
基金
- This work has been partially supported by the REMI FUI project and the ANR (French National Research Agency) MLVIS project
引用论文
- Jacob Abernethy, Olivier Chapelle, and Carlos Castillo. Witch: A new approach to web spam detection. In In Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2008.
- Ralitsa Angelova, Gjergji Kasneci, and Gerhard Weikum. Graffiti: graph-based classification in heterogeneous networks. World Wide Web, 15(2):139–170, 2012.
- Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res., 7:2399–2434, December 2006.
- A. Bordes, J. Weston, R. Collobert, and Y. Bengio. Learning structured embeddings of knowledge bases. In AAAI, 2011.
- Darcy Davis, Ryan Lichtenwalter, and N.V. Chawla. Multi-Relational Link Prediction in Heterogeneous Information Networks. In ASONAM, 2011.
- Ludovic Denoyer and Patrick Gallinari. A ranking based model for automatic image annotation in a social network. In ICWSM, 2010.
- Sheng Gao, Ludovic Denoyer, and Patrick Gallinari. Link pattern prediction with tensor decomposition in multi-relational networks. In CIDM, 2011.
- Taehyun Hwang and Rui Kuang. A heterogeneous label propagation algorithm for disease gene discovery. In SDM, page 12, 2010.
- Yann Jacob, Ludovic Denoyer, and Patrick Gallinari. Classification and annotation in social corpora using multiple relations. In CIKM, pages 1215–1220, 2011.
- Ming Ji, Jiawei Han, and Marina Danilevsky. Ranking-based classification of heterogeneous information networks. In KDD, pages 1298–1306. ACM, 2011.
- Ming Ji, Yizhou Sun, Marina Danilevsky, Jiawei Han, and Jing Gao. Graph regularized transductive classification on heterogeneous information networks. In ECML PKDD, volume 53, pages 570–586, 2010.
- T. Kato, H. Kashima, and M. Sugiyama. Integration of multiple networks for robust label propagation. In SIAM Conf. on Data Mining, pages 716–726, 2008.
- Xiangnan Kong, Bokai Cao, and Philip S. Yu. Multi-label classification by mining label and instance correlations from heterogeneous information networks. KDD ’13, pages 614–622, 2013.
- Xiangnan Kong, Philip S. Yu, Ying Ding, and David J. Wild. Meta path-based collective classification in heterogeneous information networks. In CIKM, pages 1567–1571, 2012.
- Zhenzhen Kou. Stacked graphical models for efficient inference in markov random fields. In In Proc. of the 2007 SIAM International Conf. on Data Mining, 2007.
- Lu Liu, Jie Tang, Jiawei Han, and Meng Jiang. Mining topic-level influence in heterogeneous networks. In CIKM, 2010.
- Sofus A. Macskassy and Foster Provost. A simple relational classifier. In Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM-2003) at KDD-2003, pages 64–76, 2003.
- Francis Maes, Stephane Peters, Ludovic Denoyer, and Patrick Gallinari. Simulated iterative classification a new learning procedure for graph labeling. Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases Part II, II:47–62, 2009.
- Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93–106, 2008.
- Yizhou Sun, Charu C. Aggarwal, and Jiawei Han. Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. PVLDB, 5(5):394–405, 2012.
- Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. PVLDB, 4(11):992–1003, 2011.
- Yizhou Sun, Yintao Yu, and Jiawei Han. Ranking-based clustering of heterogeneous information networks with star network schema. In KDD, pages 797–806, 2009.
- Lei Tang, Xufei Wang, and Huan Liu. Community detection via heterogeneous interaction analysis. Data Min. Knowl. Discov., 25(1):1–33, 2012.
- Chi Wang, Rajat Raina, David Fong, Ding Zhou, Jiawei Han, and Greg J. Badros. Learning relevance from heterogeneous social network and its application in online targeting. In SIGIR, pages 655–664, 2011.
- Meng Wang, Xian-Sheng Hua, Richang Hong, Jinhui Tang, Guo-Jun Qi, and Yan Song. Unified video annotation via multigraph learning. Circuits and Systems for Video Technology, IEEE Transactions on, 19(5):733 –746, may 2009.
- Xiao Yu, Yizhou Sun, Brandon Norick, Tiancheng Mao, and Jiawei Han. User guided entity similarity search using meta-path selection in heterogeneous information networks. CIKM ’12, pages 2025–2029, New York, NY, USA, 2012. ACM.
- Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Scholkopf. Learning with local and global consistency. In Sebastian Thrun, Lawrence Saul, and Bernhard Scholkopf, editors, Advances in Neural Inform. Process. Systems 16. 2004.
- Dengyong Zhou, Jiayuan Huang, and Bernhard Scholkopf. Learning from labeled and unlabeled data on a directed graph. In Proc. of the 22nd intern. conf. on Mach. learn., ICML ’05, pages 1036–1043, 2005.
标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn