## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Graph Information Bottleneck

NIPS 2020, (2020)

EI

Keywords

Abstract

Representation learning of graph-structured data is challenging because both graph structure and node features carry important information. Graph Neural Networks (GNNs) provide an expressive way to fuse information from network structure and node features. However, GNNs are prone to adversarial attacks. Here we introduce Graph Informati...More

Introduction

- Representation learning on graphs aims to learn representations of graph-structured data for downstream tasks such as node classification and link prediction [1, 2].
- Graph representation learning is a challenging task since both node features as well as graph structure carry important information [3, 4].
- Graph Neural Networks (GNNs) [1, 3, 5,6,7] have demonstrated impressive performance, by learning to fuse information from both the node features and the graph structure [8].
- GNN’s reliance on message passing over the edges of the graph makes it prone to noise and adversarial attacks that target at the graph structure [15, 16]

Highlights

- Representation learning on graphs aims to learn representations of graph-structured data for downstream tasks such as node classification and link prediction [1, 2]
- We introduce Graph Information Bottleneck (GIB), an information-theoretic principle inherited from IB, adapted for representation learning on graph-structured data
- We consider the following two questions: (1) Boosted by GIB, does GIB-Cat and GIB-Bern learn more robust representations than Graph Attention Networks (GAT) to defend against attacks? (2) How does each component of GIB contribute to such robustness, especially, to controlling the information from one of the two sides — the structure and node features?
- We have introduced Graph Information Bottleneck (GIB), an information-theoretic principle for learning representations that capture minimal sufficient information from graph-structured data
- We have demonstrated the efficacy of GIB by evaluating the robustness of the GAT model trained under the GIB principle on adversarial attacks
- Are there any other better instantiations of GIB, especially in capturing discrete structural information? If incorporated with a node for global aggregation, can GIB break the limitation of the local-dependence assumption? May GIB be applied to other graph-related tasks including link prediction and graph classification?

Methods

- The goal of the experiments is to test whether GNNs trained with the GIB objective are more robust and reliable.
- For GCNJaccard and RGCN, the authors perform extensive hyperparameter search as detailed in Appendix G.3.
- For GIB-Cat and GIB-Bern, the authors keep the same architectural component as GAT, and for the additional hyperparameters k and T (Algorithm 1, 2 and 3), the authors search k ∈ {2, 3} and T ∈ {1, 2} for each experimental setting and report the better performance.

Results

- GIB-based models empirically achieve up to 31% improvement with adversarial perturbation of the graph structure as well as node features.
- GIB-Cat and GIB-Bern improve the classification accuracy by up to 31.3% and 34.0% under adversarial perturbation, respectively

Conclusion

- The authors have introduced Graph Information Bottleneck (GIB), an information-theoretic principle for learning representations that capture minimal sufficient information from graph-structured data.
- GNNs share a common issue with other techniques based on neural networks
- They are very sensitive to noise of data and are fragile to model attacks.
- The Graph Information Bottleneck (GIB) principle proposed in this work paves a principled way to alleviate the above problem by increasing the robustness of GNN models.
- The authors' work further releases the worries about the usage of GNN techniques in practical systems, such as recommender systems, social media, or to analyze data for other disciplines, including physics, biology, social science.
- The authors' work increases the interaction between AI, machine learning techniques and other aspects of the society, and could achieve far-reaching impact

- Table1: Average classification accuracy (%) for the targeted nodes under direct attack. Each number is the average accuracy for the 40 targeted nodes for 5 random initialization of the experiments. Bold font denotes top two models
- Table2: Average classification accuracy (%) for the ablations of GIB-Cat and GIB-Bern on Cora dataset
- Table3: Classification F1-micro (%) for the trained models with increasing additive feature noise. Bold font denotes top 2 models
- Table4: Summary of the datasets and splits in our experiments
- Table5: Hyperparameter scope for Section 5.1 and 5.2 for GIB-Cat and GIB-Bern
- Table6: Hyperparameter for adversarial attack experiment for GIB-Cat and GIB-Bern
- Table7: Hyperparameter for adversarial attack experiment for the ablations of GIB-Cat and
- Table8: Hyperparameter for feature attack experiment (Section 5.2) for GIB-Cat and GIB-Bern
- Table9: Hyperparameter of baselines used on Citeseer dataset
- Table10: Hyperparameter of baselines used on Cora dataset
- Table11: Hyperparameter of baselines used on Pubmed dataset
- Table12: Average classification accuracy (%) for the targeted nodes under direct attack for Cora
- Table13: Statistics of the target nodes and adversarial perturbations by Nettack in Section 5.1

Related work

- GNNs learn node-level representations through message passing and aggregation from neighbors [1, 3, 29,30,31]. Several previous works further incorporate the attention mechanism to adaptively learn the correlation between a node and its neighbor [5, 32]. Recent literature shows that representations learned by GNNs are far from robust and can be easily attacked by malicious manipulation on either features or structure [15, 16]. Accordingly, several defense models are proposed to increase the robustness by injecting random noise in the representations [33], removing suspicious and uninformative edges [34], low-rank approximation of the adjacency matrix [35], additional hinge loss for certified robustness [36]. In contrast, even though not specifically designed against adversarial attacks, our model learns robust representations via the GIB principle that naturally defend against attacks. Moreover, none of those defense models has theoretical foundations except [36] that uses tools of robust optimization instead of information theory.

Funding

- Hongyu Ren is supported by the Masason Foundation Fellowship
- We also gratefully acknowledge the support of DARPA under Nos

Study subjects and analysis

Reference

- W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in neural information processing systems, 2017.
- T. N. Kipf and M. Welling, “Variational graph auto-encoders,” arXiv preprint arXiv:1611.07308, 2016.
- ——, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Representations, 2017.
- P. Li, I. Chien, and O. Milenkovic, “Optimizing generalized pagerank methods for seedexpansion community detection,” in Advances in Neural Information Processing Systems, 2019.
- P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” in International Conference on Learning Representations, 2018.
- J. Chen, T. Ma, and C. Xiao, “FastGCN: Fast learning with graph convolutional networks via importance sampling,” in International Conference on Learning Representations, 2018.
- J. Klicpera, A. Bojchevski, and S. Günnemann, “Predict then propagate: Graph neural networks meet personalized pagerank,” in International Conference on Learning Representations, 2019.
- K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in International Conference on Learning Representations, 2019.
- J. You, R. Ying, and J. Leskovec, “Position-aware graph neural networks,” in International Conference on Machine Learning, 2019.
- H. Pei, B. Wei, K. C.-C. Chang, Y. Lei, and B. Yang, “Geom-gcn: Geometric graph convolutional networks,” in International Conference on Learning Representations, 2020.
- H. Maron, H. Ben-Hamu, H. Serviansky, and Y. Lipman, “Provably powerful graph networks,” in Advances in Neural Information Processing Systems, 2019.
- R. Murphy, B. Srinivasan, V. Rao, and B. Riberio, “Relational pooling for graph representations,” in International Conference on Machine Learning, 2019.
- Z. Chen, S. Villar, L. Chen, and J. Bruna, “On the equivalence between graph isomorphism testing and function approximation with gnns,” in Advances in Neural Information Processing Systems, 2019.
- Y. Hou, J. Zhang, J. Cheng, K. Ma, R. T. B. Ma, H. Chen, and M.-C. Yang, “Measuring and improving the use of graph information in graph neural networks,” in International Conference on Learning Representations, 2020.
- D. Zügner, A. Akbarnejad, and S. Günnemann, “Adversarial attacks on neural networks for graph data,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018.
- H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song, “Adversarial attack on graph structured data,” arXiv preprint arXiv:1806.02371, 2018.
- T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley & Sons, 2012.
- N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057, 2000.
- N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in 2015 IEEE Information Theory Workshop (ITW). IEEE, 2015.
- P. A. M. Dirac, The principles of quantum mechanics. Oxford university press, 1981, no. 27.
- A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” arXiv preprint arXiv:1612.00410, 2016.
- B. Poole, S. Ozair, A. Van Den Oord, A. Alemi, and G. Tucker, “On variational bounds of mutual information,” in International Conference on Machine Learning, 2019.
- X. Nguyen, M. J. Wainwright, and M. I. Jordan, “Estimating divergence functionals and the likelihood ratio by convex risk minimization,” IEEE Transactions on Information Theory, 2010.
- E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” in International Conference on Learning Representations, 2017.
- C. J. Maddison, A. Mnih, and Y. W. Teh, “The concrete distribution: A continuous relaxation of discrete random variables,” in International Conference on Learning Representations, 2017.
- I. Fischer and A. A. Alemi, “Ceb improves model robustness,” arXiv preprint arXiv:2002.05380, 2020.
- N. Dilokthanakul, P. A. Mediano, M. Garnelo, M. C. Lee, H. Salimbeni, K. Arulkumaran, and M. Shanahan, “Deep unsupervised clustering with gaussian mixture variational autoencoders,” arXiv preprint arXiv:1611.02648, 2016.
- A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
- J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
- R. Li, S. Wang, F. Zhu, and J. Huang, “Adaptive graph convolutional neural networks,” in Thirty-second AAAI conference on artificial intelligence, 2018.
- K. Xu, C. Li, Y. Tian, T. Sonobe, K.-i. Kawarabayashi, and S. Jegelka, “Representation learning on graphs with jumping knowledge networks,” arXiv preprint arXiv:1806.03536, 2018.
- J. Zhang, X. Shi, J. Xie, H. Ma, I. King, and D.-Y. Yeung, “Gaan: Gated attention networks for learning on large and spatiotemporal graphs,” arXiv preprint arXiv:1803.07294, 2018.
- D. Zhu, Z. Zhang, P. Cui, and W. Zhu, “Robust graph convolutional networks against adversarial attacks,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019.
- H. Wu, C. Wang, Y. Tyshetskiy, A. Docherty, K. Lu, and L. Zhu, “Adversarial examples for graph data: Deep insights into attack and defense,” in International Joint Conference on Artificial Intelligence, IJCAI, 2019.
- N. Entezari, S. A. Al-Sayouri, A. Darvishzadeh, and E. E. Papalexakis, “All you need is low (rank) defending against adversarial attacks on graphs,” in Proceedings of the 13th International Conference on Web Search and Data Mining, 2020.
- D. Zügner and S. Günnemann, “Certifiable robustness and robust training for graph convolutional networks,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019.
- P. Velickovic, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, “Deep graph infomax,” arXiv preprint arXiv:1809.10341, 2018.
- Z. Peng, W. Huang, M. Luo, Q. Zheng, Y. Rong, T. Xu, and J. Huang, “Graph representation learning via graphical mutual information maximization,” in Proceedings of The Web Conference 2020, 2020.
- F.-Y. Sun, J. Hoffmann, and J. Tang, “Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization,” arXiv preprint arXiv:1908.01000, 2019.
- X. B. Peng, A. Kanazawa, S. Toyer, P. Abbeel, and S. Levine, “Variational discriminator bottleneck: Improving imitation learning, inverse rl, and gans by constraining information flow,” arXiv preprint arXiv:1810.00821, 2018.
- I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational framework.” in International Conference on Learning Representations, 2017.
- R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio, “Learning deep representations by mutual information estimation and maximization,” in International Conference on Learning Representations, 2019.
- P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, “Collective classification in network data,” AI magazine, 2008.
- E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility: user movement in locationbased social networks,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011.
- O. Mason and M. Verwoerd, “Graph theory and networks in biology,” IET systems biology, 2007.
- M. Barthélemy, “Spatial networks,” Physics Reports, 2011.
- I. Kaastra and M. Boyd, “Designing a neural network for forecasting financial,” Neurocomputing, 1996.
- R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional neural networks for web-scale recommender systems,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018.
- I. Fischer, “The conditional entropy bottleneck,” arXiv preprint arXiv:2002.05379, 2020.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019.
- M. Fey and J. E. Lenssen, “Fast graph representation learning with PyTorch Geometric,” in ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn