AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We show that with our proposed method, we can achieve state-of-the-art performance even when using simpler graph neural networks architectures such as Graph Convolutional Networks, with no additional memory cost and with minimal additional computation cost

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

AAAI, pp.10024-10032, (2021)

Cited by: 0|Views165
EI
Full Text
Bibtex
Weibo

Abstract

We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization. Further, we provide a theoretical analysis of how GraphMix improves the...More

Code:

Data:

0
Introduction
  • Due to the presence of graph-structured data across a wide variety of domains, such as biological networks, citation networks and social networks, there have been several attempts to design neural networks, known as graph neural networks (GNN), that can process arbitrarily structured graphs.
  • 2018, 2019; Qu, Bengio, and Tang 2019; Gao and Ji 2019; Ma et al 2019), among others
  • Many of these approaches are designed for addressing the problem of semi-supervised learning over graph-structured data (Zhou et al 2018).
  • The authors instead propose an architecture-agnostic method for regularized training of GNNs for semi-supervised node classification.
Highlights
  • Due to the presence of graph-structured data across a wide variety of domains, such as biological networks, citation networks and social networks, there have been several attempts to design neural networks, known as graph neural networks (GNN), that can process arbitrarily structured graphs
  • We show that with our proposed method, we can achieve state-of-the-art performance even when using simpler GNN architectures such as Graph Convolutional Networks (Kipf and Welling 2017), with no additional memory cost and with minimal additional computation cost
  • We conduct a theoritical analysis to demonstrate the effectiveness of the proposed method over the underlying GNNs
  • An important question is how these more discriminative node representations can be transferred to the GNN? One potential approach could involve maximizing the mutual information between the hidden states of the FullyConnected Network (FCN) and the GNN using formulations similar to those proposed by (Hjelm et al 2019; Sun et al 2020)
  • We propose parameter sharing between FCN and GNN to facilitate the transfer of discriminative node representations from the FCN to the GNN
  • We observe that GraphMix always improves the accuracy of the underlying GNNs such as GCN, GAT and Graph-U-Net across all the dataset, with GraphMix(GCN) achieving the best results
Methods
  • The authors first describe GraphMix at a high-level and give a more formal description. GraphMix augments the vanilla GNN with a Fully-Connected Network (FCN).
  • One potential approach could involve maximizing the mutual information between the hidden states of the FCN and the GNN using formulations similar to those proposed by (Hjelm et al 2019; Sun et al 2020)
  • Using the more discriminative representations of the nodes from FCN, as well as the graph structure, the GNN loss is computed in the usual way to further refine the node representations
  • In this way the authors can exploit the improved representations from Manifold Mixup for training GNNs. Results reported from the literature.
  • Input droputrate=0.5 and hidden dropout rate=0.5 work best for Cora and Citeseer and Input dropout rate=0.2 and hidden dropout rate =0.2 work best for Pubmed
Results
  • The authors provide results on three recently proposed datasets which are relatively larger than standard benchmark datasets (Cora/Citeseer/Pubmed).
  • The authors use Cora-Full dataset proposed in (Bojchevski and Günnemann 2018) and Coauthor-CS and Coauthor-Physics datasets proposed in (Shchur et al 2018).
  • Mix(GCN) improves the results over GCN for all the three datasets with a significant margin.
  • The details of the datasets is given in Appendix A.4
Conclusion
  • GraphMix is a simple and efficient regularizer for semisupervised node classification using graph neural networks.
  • The authors' extensive experiments demonstrate state-of-the-art performance using GraphMix on benchmark datasets.
  • The authors' theoretical analysis compares generalization bounds of GraphMix vs the underlying GNNs. The strong empirical results of GraphMix suggest that in parallel to designing new architectures, exploring better regularization for graph-structured data is a promising avenue for research.
  • A future research direction is to jointly model the node features and edges of the graph such that they can be further used for generating the synthetic interpolated nodes and their corresponding connectivity to the other nodes in the graph
Tables
  • Table1: Results of node classification (% test accuracy) on the standard split of datasets. [*] means the results are taken from the corresponding papers. We conduct 100 trials and report mean and standard deviation over the trials (refer to Table 8 in the Appendix for comparison with other methods on standard Train/Validation/Test split)
  • Table2: Results of node classification (% test accuracy) using 10 random Train/Validation/Test split of datasets. We conduct 100 trials and report mean and standard deviation over the trials
  • Table3: Comparison of GraphMix with other methods (% test accuracy ), for Cora-Full, Coauthor-CS, Coauthor-Physics. ∗ refers to the results reported in (<a class="ref-link" id="cShchur_et+al_2018_a" href="#rShchur_et+al_2018_a">Shchur et al 2018</a>)
  • Table4: Ablation study results using 10 labeled samples per class (% test accuracy). We report mean and standard deviation over ten trials. See Section A.5 for the meaning of methods in leftmost column
  • Table5: Results on Link Classification (%F1 score). ∗ means the results are taken from the corresponding papers
  • Table6: Dataset statistics
  • Table7: Dataset statistics for Larger datasets
  • Table8: Comparison of GraphMix with other methods (% test accuracy ), for Cora, Citeseer and Pubmed
  • Table9: Results using less labeled samples (% test accuracy). K referes to the number of labeled samples per class
Download tables as Excel
Funding
  • Despite its simplicity, we demonstrate that GraphMix can consistently improve or closely match stateof-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets: CoraFull, Co-author-CS and Co-author-Physics
  • We show that with our proposed method, we can achieve state-of-the-art performance even when using simpler GNN architectures such as Graph Convolutional Networks (Kipf and Welling 2017), with no additional memory cost and with minimal additional computation cost
  • We observe that GraphMix always improves the accuracy of the underlying GNNs such as GCN, GAT and Graph-U-Net across all the dataset, with GraphMix(GCN) achieving the best results
Study subjects and analysis
recently proposed datasets: 3
4.2 Results on Larger Datasets. We also provide results on three recently proposed datasets which are relatively larger than standard benchmark datasets (Cora/Citeseer/Pubmed). We use Cora-Full dataset proposed in (Bojchevski and Günnemann 2018) and Coauthor-CS and Coauthor-Physics datasets proposed in (Shchur et al 2018)

datasets with a significant margin: 3
GraphMix (GCN) GraphMix (Graph-U-Net). Mix(GCN) improves the results over GCN for all the three datasets with a significant margin. We note that we did minimal hyperparameter search for GraphMix(GCN) as mentioned in Appendix A.8

labeled samples: 10
Comparison of GraphMix with other methods (% test accuracy ), for Cora-Full, Coauthor-CS, Coauthor-Physics. ∗ refers to the results reported in (Shchur et al 2018). Ablation study results using 10 labeled samples per class (% test accuracy). We report mean and standard deviation over ten trials. See Section A.5 for the meaning of methods in leftmost column. Results on Link Classification (%F1 score). ∗ means the results are taken from the corresponding papers

Reference
  • Bartlett, P. L.; and Mendelson, S. 2002. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3(Nov): 463–482.
    Google ScholarLocate open access versionFindings
  • Beckham, C.; Honari, S.; Verma, V.; Lamb, A.; Ghadiri, F.; Devon Hjelm, R.; Bengio, Y.; and Pal, C. 2019. On Adversarial Mixup Resynthesis. arXiv e-prints arXiv:1903.02709.
    Findings
  • Belkin, M.; Niyogi, P.; and Sindhwani, V. 2006. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. J. Mach. Learn. Res. 7: 2399–2434. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=1248547.1248632.
    Locate open access versionFindings
  • Berthelot, D.; Carlini, N.; Goodfellow, I.; Papernot, N.; Oliver, A.; and Raffel, C. 2019. MixMatch: A Holistic Approach to Semi-Supervised Learning. arXiv e-prints arXiv:1905.02249.
    Findings
  • Blum, A.; and Mitchell, T. 1998. Combining Labeled and Unlabeled Data with Co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT’ 98, 92–100. New York, NY, USA: ACM. ISBN 1-58113-057-0. doi:10.1145/279943.279962. URL http://doi.acm.org/10.1145/279943.279962.
    Locate open access versionFindings
  • Bojchevski, A.; and Günnemann, S. 2018. Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking. In International Conference on Learning Representations. URL https://openreview.net/forum?id=r1ZdKJ-0W.
    Locate open access versionFindings
  • Bruna, J.; Zaremba, W.; Szlam, A.; and LeCun, Y. 2013. Spectral Networks and Locally Connected Networks on Graphs. CoRR abs/1312.6203.
    Findings
  • Chapelle, O.; Schlkopf, B.; and Zien, A. 2010. SemiSupervised Learning. The MIT Press, 1st edition. ISBN 0262514125, 9780262514125.
    Google ScholarFindings
  • Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Lee, D. D.; Sugiyama, M.; Luxburg, U. V.; Guyon, I.; and Garnett, R., eds., Advances in Neural Information Processing Systems 29, 3844–3852.
    Google ScholarLocate open access versionFindings
  • Deng, Z.; Dong, Y.; and Zhu, J. 2019. Batch Virtual Adversarial Training for Graph Convolutional Networks. CoRR abs/1902.09192. URL http://arxiv.org/abs/1902.09192.
    Findings
  • Devries, T.; and Taylor, G. W. 2017. Improved Regularization of Convolutional Neural Networks with Cutout. CoRR abs/1708.04552. URL http://arxiv.org/abs/1708.04552.
    Findings
  • Ding, M.; Tang, J.; and Zhang, J. 2018. Semi-supervised Learning on Graphs with Generative Adversarial Nets. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM ’18, 913– 922. New York, NY, USA: ACM. ISBN 978-1-4503-6014-2. doi:10.1145/3269206.3271768. URL http://doi.acm.org/10.1145/3269206.3271768.
    Locate open access versionFindings
  • Feng, F.; He, X.; Tang, J.; and Chua, T. 2019. Graph Adversarial Training: Dynamically Regularizing Based on Graph Structure. CoRR abs/1902.08226. URL http://arxiv.org/abs/1902.08226.
    Findings
  • Gao, H.; and Ji, S. 2019. Graph U-Nets. In Chaudhuri, K.; and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, 2083–2092. Long Beach, California, USA: PMLR. URL http://proceedings.mlr.press/v97/gao19a.html.
    Locate open access versionFindings
  • Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; and Dahl, G. E. 2017. Neural Message Passing for Quantum Chemistry. In ICML.
    Google ScholarFindings
  • Gori, M.; Monfardini, G.; and Scarselli, F. 2005. A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, 729–734. IEEE.
    Google ScholarLocate open access versionFindings
  • Grandvalet, Y.; and Bengio, Y. 2005. Semi-supervised Learning by Entropy Minimization. In Saul, L. K.; Weiss, Y.; and Bottou, L., eds., Advances in Neural Information Processing Systems 17, 529–536.
    Google ScholarLocate open access versionFindings
  • Hamilton, W.; Ying, Z.; and Leskovec, J. 2017. Inductive representation learning on large graphs. In NIPS.
    Google ScholarFindings
  • Henaff, M.; Bruna, J.; and LeCun, Y. 2015. Deep Convolutional Networks on Graph-Structured Data. ArXiv abs/1506.05163.
    Findings
  • Hjelm, R. D.; Fedorov, A.; Lavoie-Marchildon, S.; Grewal, K.; Bachman, P.; Trischler, A.; and Bengio, Y. 2019. Learning deep representations by mutual information estimation and maximization. In International Conference on Learning Representations. URL https://openreview.net/forum?id= Bklr3j0cKX.
    Locate open access versionFindings
  • Jeong, J.; Verma, V.; Hyun, M.; Kannala, J.; and Kwak, N. 2020. Interpolation-based semi-supervised learning for object detection.
    Google ScholarFindings
  • Kipf, T. N.; and Welling, M. 2016. Variational graph autoencoders. arXiv preprint arXiv:1611.07308.
    Findings
  • Kipf, T. N.; and Welling, M. 2017. Semi-supervised classification with graph convolutional networks. In ICLR.
    Google ScholarFindings
  • Ko, T.; Peddinti, V.; Povey, D.; and Khudanpur, S. 2015. Audio augmentation for speech recognition. In INTERSPEECH.
    Google ScholarFindings
  • Kumar, S.; Hooi, B.; Makhija, D.; Kumar, M.; Faloutsos, C.; and Subrahmanian, V. 2018. Rev2: Fraudulent user prediction in rating platforms. In WSDM.
    Google ScholarFindings
  • Kumar, S.; Spezzano, F.; Subrahmanian, V.; and Faloutsos, C. 2016. Edge weight prediction in weighted signed networks. In ICDM.
    Google ScholarFindings
  • Laine, S.; and Aila, T. 2016. Temporal Ensembling for SemiSupervised Learning. CoRR abs/1610.02242. URL http://arxiv.org/abs/1610.02242.
    Findings
  • Lee, D.-H. 2013. Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks.
    Google ScholarFindings
  • Li, Q.; Han, Z.; and Wu, X.-M. 2018. Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning. In AAAI.
    Google ScholarFindings
  • Lu, Q.; and Getoor, L. 2003. Link-based Classification. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML’03, 496–503. AAAI Press. ISBN 1-57735-189-4. URL http://dl.acm.org/citation.cfm?id=3041838.3041901.
    Locate open access versionFindings
  • Ma, J.; Cui, P.; Kuang, K.; Wang, X.; and Zhu, W. 2019. Disentangled Graph Convolutional Networks. In ICML.
    Google ScholarLocate open access versionFindings
  • Miyato, T.; ichi Maeda, S.; Koyama, M.; and Ishii, S. 2018. Virtual Adversarial Training: a Regularization Method for Supervised and Semi-supervised Learning. IEEE transactions on pattern analysis and machine intelligence.
    Google ScholarFindings
  • Monti, F.; Boscaini, D.; Masci, J.; Rodola, E.; Svoboda, J.; and Bronstein, M. M. 2016. Geometric deep learning on graphs and manifolds using mixture model CNNs. CoRR abs/1611.08402. URL http://arxiv.org/abs/1611.08402.
    Findings
  • Park, D. S.; Chan, W.; Zhang, Y.; Chiu, C.-C.; Zoph, B.; Cubuk, E. D.; and Le, Q. V. 2019. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. arXiv e-prints arXiv:1904.08779.
    Findings
  • Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. Deepwalk: Online learning of social representations. In KDD.
    Google ScholarFindings
  • Qu, M.; Bengio, Y.; and Tang, J. 2019. GMNN: Graph Markov Neural Networks. In Chaudhuri, K.; and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, 5241–5250. Long Beach, California, USA: PMLR.
    Google ScholarLocate open access versionFindings
  • Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; and Monfardini, G. 2009. The Graph Neural Network Model. Trans. Neur. Netw. 20(1): 61–80. ISSN 1045-9227. doi: 10.1109/TNN.2008.2005605. URL http://dx.doi.org/10.1109/ TNN.2008.2005605.
    Locate open access versionFindings
  • Shchur, O.; Mumme, M.; Bojchevski, A.; and Günnemann, S. 2018. Pitfalls of Graph Neural Network Evaluation. CoRR abs/1811.05868. URL http://arxiv.org/abs/1811.05868.
    Findings
  • Sun, F.-Y.; Hoffman, J.; Verma, V.; and Tang, J. 2020. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. In International Conference on Learning Representations. URL https://openreview.net/forum?id=r1lfF2NYvH.
    Locate open access versionFindings
  • Tarvainen, A.; and Valpola, H. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems 30, 1195–1204.
    Google ScholarLocate open access versionFindings
  • Taskar, B.; Wong, M.-F.; Abbeel, P.; and Koller, D. 2004. Link prediction in relational data. In NIPS.
    Google ScholarFindings
  • van der Maaten, L.; and Hinton, G. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9: 2579– 2605. URL http://www.jmlr.org/papers/v9/vandermaaten08a.html.
    Locate open access versionFindings
  • Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; and Bengio, Y. 2018. Graph Attention Networks. In ICLR.
    Google ScholarLocate open access versionFindings
  • Velickovic, P.; Fedus, W.; Hamilton, W. L.; Liò, P.; Bengio, Y.; and Hjelm, R. D. 2019. Deep graph infomax. In ICLR.
    Google ScholarLocate open access versionFindings
  • Verma, S.; and Zhang, Z.-L. 2019. Stability and generalization of graph convolutional neural networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1539–1548.
    Google ScholarLocate open access versionFindings
  • Verma, V.; Lamb, A.; Beckham, C.; Najafi, A.; Mitliagkas, I.; Lopez-Paz, D.; and Bengio, Y. 2019a. Manifold Mixup: Better Representations by Interpolating Hidden States. In Chaudhuri, K.; and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, 6438–6447. Long Beach, California, USA: PMLR. URL http://proceedings.mlr.press/v97/verma19a.html.
    Locate open access versionFindings
  • Verma, V.; Lamb, A.; Juho, K.; Bengio, Y.; and Lopez-Paz, D. 2019b. Interpolation Consistency Training for Semisupervised Learning. In Kraus, S., ed., Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 1016, 2019. ijcai.org. doi:10.24963/ijcai.2019. URL https://doi.org/10.24963/ijcai.2019.
    Locate open access versionFindings
  • Weston, J.; Ratle, F.; Mobahi, H.; and Collobert, R. 2012. Deep Learning via Semi-Supervised Embedding. In Montavon, G.; Orr, G.; and Müller, K. R., eds., In Neural Networks: Tricks of the Trade. Springer, second edition.
    Google ScholarLocate open access versionFindings
  • Xie, Z.; Wang, S. I.; Li, J.; Lévy, D.; Nie, A.; Jurafsky, D.; and Ng, A. Y. 2017. Data Noising as Smoothing in Neural Network Language Models. ArXiv abs/1703.02573.
    Findings
  • Xu, K.; Li, C.; Tian, Y.; Sonobe, T.; Kawarabayashi, K.-i.; and Jegelka, S. 2018. Representation Learning on Graphs with Jumping Knowledge Networks. In Dy, J.; and Krause, A., eds., Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, 5453–5462.
    Google ScholarLocate open access versionFindings
  • Stockholmsmässan, Stockholm Sweden: PMLR. URL http://proceedings.mlr.press/v80/xu18c.html.
    Locate open access versionFindings
  • Yang, Z.; Cohen, W.; and Salakhudinov, R. 2016. Revisiting Semi-Supervised Learning with Graph Embeddings. In ICML.
    Google ScholarFindings
  • Zhang, H.; Cisse, M.; Dauphin, Y. N.; and Lopez-Paz, D. 2018. mixup: Beyond Empirical Risk Minimization. International Conference on Learning Representations URL https://openreview.net/forum?id=r1Ddp1-Rb.
    Locate open access versionFindings
  • Zhou, J.; Cui, G.; Zhang, Z.; Yang, C.; Liu, Z.; and Sun, M. 2018. Graph Neural Networks: A Review of Methods and Applications. CoRR abs/1812.08434. URL http://arxiv.org/abs/1812.08434.
    Findings
  • Zhu, X.; and Ghahramani, Z. 2002. Learning from Labeled and Unlabeled Data with Label Propagation. Technical report.
    Google ScholarFindings
  • Zhu, X.; Ghahramani, Z.; and Lafferty, J. D. 2003. Semisupervised learning using gaussian fields and harmonic functions. In ICML.
    Google ScholarFindings
  • A recently proposed method for accurate target predictions for unlabeled data uses the average of predicted targets across K random augmentations of the input sample (Berthelot et al. 2019). Along these lines, in GraphMix we compute the predicted-targets as the average of predictions made by GNN on K drop-out versions of the input sample.
    Google ScholarLocate open access versionFindings
  • For semi-supervised link classification, we use two datasets Bitcoin Alpha and Bitcoin OTC from (Kumar et al. 2016, 2018). The nodes in these datasets correspond to the bitcoin users and the edge weights between them correspond to the degree of trust between the users. Following (Qu, Bengio, and Tang 2019), we treat edges with weights greater than 3 as positive instances, and edges with weights less than -3 are treated as negative ones. Given a few labeled edges, the task is to predict the labels of the remaining edges. The statistics of these datasets as well as the number of training/validation/test nodes is presented in Appendix A.4.
    Google ScholarLocate open access versionFindings
Author
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科