## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# On Learning Ising Models under Huber's Contamination Model

NIPS 2020, (2020)

EI

Abstract

We study the problem of learning Ising models in a setting where some of the samples from the underlying distribution can be arbitrarily corrupted. In such a setup, we aim to design statistically optimal estimators in a high-dimensional scaling in which the number of nodes p, the number of edges k and the maximal node degree d are allowed...More

Code:

Data:

Introduction

- Undirected graphical models ( known as Markov random fields (MRFs)) have gained significant attention as a tool for discovering and visualizing dependencies among covariates in multivariate data.
- Graphical models provide compact and structured representations of the joint distribution of multiple random variables using graphs that represent conditional independences between the individual random variables.
- They are used in domains as varied as natural language processing[37], image processing [9, 24, 26], spatial statistics [43] and computational biology [23], among others.
- An Ising model is a special instantiation of an MRF where each random variable Xs take values in {−1, +1}, and the joint probability mass function is given by: Pθ(x1, . . . , xp) ∝ exp θstxsxt ,

Highlights

- Undirected graphical models ( known as Markov random fields (MRFs)) have gained significant attention as a tool for discovering and visualizing dependencies among covariates in multivariate data
- We focus on the specific undirected graphical model sub-class of Ising models [29]
- An Ising model is a special instantiation of an MRF where each random variable Xs take values in {−1, +1}, and the joint probability mass function is given by:
- We propose the first statistically optimal estimator for sparse logistic regression, and use that to provide estimators for learning Ising models
- In this work we provided the first statistically optimal robust estimators for learning Ising models in the high temperature regime
- Our estimators achieved optimal asymptotic error in the -contamination model, and high-probability deviation bounds in the uncontaminated setting

Results

- The authors notice that the slope is not drastically affected by ω, which suggests that the constant C(α) appearing in the results is O(1).
- In Figures 1(c) and 1(f), the authors notice the variation in the slope with increasing model width ω.
- While the current result study the case when ω < 1, it is interesting to note an increasing trend when ω ≥ 1 suggesting an explicit dependence on ω in the low-temperature regime

Conclusion

**Discussion and Future**

Work

In this work the authors provided the first statistically optimal robust estimators for learning Ising models in the high temperature regime.- The authors' focus was on designing estimators for the contaminated model, i.e., where a fraction of the data is arbitrarily corrupted.
- Another model of corruption - motivated by sensor networks and distributed computation where node failures are common - is when only a few features(nodes) get corrupted, and the authors still want to learn the appropriate graph structure for the uncontaminated nodes

Summary

## Introduction:

Undirected graphical models ( known as Markov random fields (MRFs)) have gained significant attention as a tool for discovering and visualizing dependencies among covariates in multivariate data.- Graphical models provide compact and structured representations of the joint distribution of multiple random variables using graphs that represent conditional independences between the individual random variables.
- They are used in domains as varied as natural language processing[37], image processing [9, 24, 26], spatial statistics [43] and computational biology [23], among others.
- An Ising model is a special instantiation of an MRF where each random variable Xs take values in {−1, +1}, and the joint probability mass function is given by: Pθ(x1, . . . , xp) ∝ exp θstxsxt ,
## Objectives:

The authors aim to design statistically optimal estimators in a high-dimensional scaling in which the number of nodes p, the number of edges k and the maximal node degree d are allowed to increase to infinity as a function of the sample size n.## Results:

The authors notice that the slope is not drastically affected by ω, which suggests that the constant C(α) appearing in the results is O(1).- In Figures 1(c) and 1(f), the authors notice the variation in the slope with increasing model width ω.
- While the current result study the case when ω < 1, it is interesting to note an increasing trend when ω ≥ 1 suggesting an explicit dependence on ω in the low-temperature regime
## Conclusion:

**Discussion and Future**

Work

In this work the authors provided the first statistically optimal robust estimators for learning Ising models in the high temperature regime.- The authors' focus was on designing estimators for the contaminated model, i.e., where a fraction of the data is arbitrarily corrupted.
- Another model of corruption - motivated by sensor networks and distributed computation where node failures are common - is when only a few features(nodes) get corrupted, and the authors still want to learn the appropriate graph structure for the uncontaminated nodes

Related work

- In this work, we focus on the specific undirected graphical model sub-class of Ising models [29]. There has been a lot of work for learning Ising models in the uncontaminated setting dating back to the classical work of Chow and Liu [8]. Csiszár and Talata [10] discuss pseudo likelihood based approaches for estimating the neighborhood at a given node in MRFs. Subsequently, a simple search based method is described in [6] with provable guarantees. Later, Ravikumar et al [42] showed that under an incoherence assumption, node-wise (regularized) estimators provably recover the correct dependency graph with a small number of samples. Recently, there has been a flurry of work [5, 30, 36, 47, 49] to get computationally efficient estimators which recover the true graph structure without the incoherence assumption, including extensions to identity and independence testing [12]. However, all the aforementioned results are in the uncontaminated setting. Recently, Lindgren et al [35] derived preliminary results for learning Ising models robustly. However, their upper and lower bounds do not match. Moreover, their analysis primarily focuses on the robustness of the Sparsitron algorithm in [30], and they do not explore the effect of the underlying graph and correlation structures comprehensively.

Funding

- AP, VS and PR acknowledge the support of NSF via IIS-1955532, OAC-1934584, DARPA via HR00112020006, and ONR via N000141812861
- SB and AP acknowledge the support of NSF via DMS-17130003 and CCF-1763734

Reference

- Mehmet Eren Ahsen and Mathukumalli Vidyasagar. An approach to one-bit compressed sensing based on probably approximately correct learning theory. The Journal of Machine Learning Research, 20(1):408–430, 2019.
- DF Andrews, PJ Bickel, FR Hampel, PJ Huber, WH Rogers, and JW Tukey. Robust estimates of location: Survey and advances, 1972.
- Sivaraman Balakrishnan, Simon S Du, Jerry Li, and Aarti Singh. Computationally efficient robust sparse estimation in high dimensions. In Conference on Learning Theory, pages 169–212, 2017.
- Julian Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society: Series B (Methodological), 36(2):192–225, 1974.
- Guy Bresler. Efficiently learning ising models on arbitrary graphs. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 771–782, 2015.
- Guy Bresler, Elchanan Mossel, and Allan Sly. Reconstruction of markov random fields from samples: Some observations and algorithms. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 343–356.
- Moses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 47–60. ACM, 2017.
- C Chow and Cong Liu. Approximating discrete probability distributions with dependence trees. IEEE transactions on Information Theory, 14(3):462–467, 1968.
- George R Cross and Anil K Jain. Markov random field texture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, (1):25–39, 1983.
- Imre Csiszár and Zsolt Talata. Consistent estimation of the basic neighborhood of markov random fields. The Annals of Statistics, pages 123–145, 2006.
- Yuval Dagan, Constantinos Daskalakis, Nishanth Dikkala, and Anthimos Vardis Kandiros. Estimating ising models from one sample. arXiv preprint arXiv:2004.09370, 2020.
- Constantinos Daskalakis, Nishanth Dikkala, and Gautam Kamath. Testing ising models. IEEE Transactions on Information Theory, 65(11):6829–6852, 2019.
- Christopher De Sa, Kunle Olukotun, and Christopher Ré. Ensuring rapid mixing and low bias for asynchronous gibbs sampling. In JMLR workshop and conference proceedings, volume 48, page 1567. NIH Public Access, 2016.
- Luc Devroye, Abbas Mehrabian, Tommy Reddad, et al. The minimax learning rates of normal and ising undirected graphical models. Electronic Journal of Statistics, 14(1):2338–2361, 2020.
- Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robust estimators in high dimensions without the computational intractability. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 655–664. IEEE, 2016.
- Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Being robust (in high dimensions) can be practical. In Proceeds of the 34th International Conference on Machine Learning, pages 999–1008, 2017.
- Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robust estimators in high-dimensions without the computational intractability. SIAM Journal on Computing, 48(2):742–864, 2019.
- Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Jacob Steinhardt, and Alistair Stewart. Sever: A robust meta-algorithm for stochastic optimization. In Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, pages 1596–1606, 2019.
- PL Dobruschin. The description of a random field by means of conditional probabilities and conditions of its regularity. Theory of Probability & Its Applications, 13(2):197–224, 1968.
- Roland L Dobrushin and Senya B Shlosman. Completely analytical interactions: constructive description. Journal of Statistical Physics, 46(5-6):983–1014, 1987.
- David L Donoho and Richard C Liu. The" automatic" robustness of minimum distance functionals. The Annals of Statistics, pages 552–586, 1988.
- David L Donoho and Richard C Liu. Geometrizing rates of convergence, iii. The Annals of Statistics, pages 668–701, 1991.
- Nir Friedman. Inferring cellular networks using probabilistic graphical models. Science, 303 (5659):799–805, 2004.
- Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence, (6): 721–741, 1984.
- Friedrich Götze, Holger Sambale, and Arthur Sinulis. Higher order concentration for functions of weakly dependent random variables. Electron. J. Probab., 24:19 pp., 2019.
- Martin Hassner and Jack Sklansky. The use of markov random fields as models of texture. In Image Modeling, pages 185–198.
- Peter J Huber. Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1):73–101, 1964.
- Peter J Huber. Robust statistics. In International Encyclopedia of Statistical Science, pages 1248–1251.
- Ernst Ising. Beitrag zur theorie des ferromagnetismus. Zeitschrift für Physik, 31(1):253–258, 1925.
- Adam Klivans and Raghu Meka. Learning graphical models using multiplicative weights. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 343–354. IEEE, 2017.
- Pravesh K Kothari, Jacob Steinhardt, and David Steurer. Robust moment estimation and improved clustering via sum of squares. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1035–1046. ACM, 2018.
- Christof Külske. Concentration inequalities for functions of gibbs fields with application to diffraction and random gibbs measures. Communications in mathematical physics, 239(1-2): 29–51, 2003.
- H Künsch. Decay of correlations under dobrushin’s uniqueness condition and its applications. Communications in Mathematical Physics, 84(2):207–222, 1982.
- Kevin A Lai, Anup B Rao, and Santosh Vempala. Agnostic estimation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 665–674. IEEE, 2016.
- Erik M Lindgren, Vatsal Shah, Yanyao Shen, Alexandros G Dimakis, and Adam Klivans. On robust learning of ising models. NeurIPS Workshop on Relational Representation Learning, 2019.
- Andrey Y Lokhov, Marc Vuffray, Sidhant Misra, and Michael Chertkov. Optimal structure and parameter learning of ising models. Science advances, 4(3):e1700791, 2018.
- Christopher D Manning, Christopher D Manning, and Hinrich Schütze. Foundations of statistical natural language processing. 1999.
- Pascal Massart. Concentration inequalities and model selection, volume 6.
- Nicolai Meinshausen, Peter Bühlmann, et al. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3):1436–1462, 2006.
- Adarsh Prasad, Sivaraman Balakrishnan, and Pradeep Ravikumar. A unified approach to robust mean estimation. arXiv preprint arXiv:1907.00927, 2019.
- Adarsh Prasad, Arun Sai Suggala, Sivaraman Balakrishnan, Pradeep Ravikumar, et al. Robust estimation via robust gradient estimation. Journal of the Royal Statistical Society Series B, 82 (3):601–627, 2020.
- Pradeep Ravikumar, Martin J Wainwright, John D Lafferty, et al. High-dimensional ising model selection using 1-regularized logistic regression. The Annals of Statistics, 38(3):1287–1319, 2010.
- Brian D Ripley. Spatial statistics, volume 575. John Wiley & Sons, 2005.
- Adam J Rothman, Peter J Bickel, Elizaveta Levina, Ji Zhu, et al. Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2:494–515, 2008.
- Narayana P Santhanam and Martin J Wainwright. Information-theoretic limits of selecting binary graphical models in high dimensions. IEEE Transactions on Information Theory, 58(7): 4117–4134, 2012.
- Daniel W Stroock and Boguslaw Zegarlinski. The logarithmic sobolev inequality for discrete spin systems on a lattice. Communications in Mathematical Physics, 149(1):175–193, 1992.
- Marc Vuffray, Sidhant Misra, Andrey Lokhov, and Michael Chertkov. Interaction screening: Efficient and sample-optimal learning of ising models. In Advances in Neural Information Processing Systems, pages 2595–2603, 2016.
- Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume Cambridge University Press, 2019.
- Shanshan Wu, Sujay Sanghavi, and Alexandros G Dimakis. Sparse logistic regression learns all discrete pairwise graphical models. In Advances in Neural Information Processing Systems, pages 8069–8079, 2019.
- Yannis G Yatracos. Rates of convergence of minimum distance estimators and kolmogorov’s entropy. The Annals of Statistics, pages 768–774, 1985.

Tags

Comments