AI帮你理解科学
AI 精读
AI抽取本论文的概要总结
微博一下:
An introduction to variational methods for graphical models
Learning in graphical models, no. 2 (1999): 183-233
EI
关键词
摘要
This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Mar...更多
代码:
数据:
简介
- The problem of probabilistic inference in graphical models is the problem of computing a conditional probability distribution over the values of some of the nodes, given the values of other nodes.
- The authors often wish to calculate marginal probabilities in graphical models, in particular the probability of the observed evidence, P(E).
- Inference algorithms do not compute the numerator and denominator of Eq (1) and divide, they generally produce the likelihood as a by-product of the calculation of P(HIE).
- Algorithms that maximize likelihood generally make use of the calculation of P(HIE) as a subroutine
重点内容
- The problem of probabilistic inference in graphical models is the problem of computing a conditional probability distribution over the values of some of the nodes, given the values of other nodes
- Viewed as a function of the parameters of the graphical model, for fixed E, P(E) is an important quantity known as the likelihood
- We provide a brief overview of the QMR-DT database here; for further details see Shwe, et al (1991)
- As in the case of the Boltzmann machine, we find that the variational parameters are linked via their Markov blankets and the consistency equation (Eq (67)) can be interpreted as a local message-passing algorithm
- We have described a variety of applications of variational methods to problems of inference and learning in graphical models
- It is important to emphasize, that research on variational methods for graphical models is of quite recent origin, and there are many open problems and unresolved issues
方法
- The authors make a few remarks on the relationships between variational methods and stochastic methods, in particular the Gibbs sampler.
- In Gibbs sampling, the message-passing is simple: each node learns the current instantiation of its Markov blanket.
- With enough samples the node can estimate the distribution over its Markov blanket and determine its own statistics.
- The authors can quite generally treat parameters as additional nodes in a graphical model (cf.
- This volume) and thereby treat Bayesian inference on the same footing as generic probabilistic inference in a graphical model
- This probabilistic inference problem is often intractable, and variational approximations can be useful.
- The ensemble is fit by minimizing the appropriate KL divergence:
结论
- The authors have described a variety of applications of variational methods to problems of inference and learning in graphical models.
- The authors hope to have convinced the reader that variational methods can provide a powerful and elegant tool for graphical models, and that the algorithms that result are simple and intuitively appealing.
- It is important to emphasize, that research on variational methods for graphical models is of quite recent origin, and there are many open problems and unresolved issues.
引用论文
- Bathe, K. J. (1996). Finite Element Procedures. Englewood Cliffs, NJ: Prentice-Hall.
- Baum, L.E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41, 164-171.
- Cover, T., & Thomas, J. (1991). Elements of Information Theory. New York: John Wiley.
- Cowell, R. (in press). Introduction to inference for Bayesian networks. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.
- Dagum, P., & Luby, M. (1993). Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artificial Intelligence, 60, 141-153.
- Dayan, P., Hinton, G. E., Neal, R., & Zemel, R. S. (1995). The Helmholtz Machine.
- Dean, T., & Kanazawa, K. (1989). A model for reasoning about causality and persistence.
- Dechter, R. (in press). Bucket elimination: A unifying framework for probabilistic inference. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer
- Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B39, 1-38.
- Draper, D. L., & Hanks, S. (1994). Localized partial evaluation of belief networks. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.
- Frey, B. Hinton, G. E., Dayan, P. (1996). Does the wake-sleep algorithm learn good density estimators? In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8.
- Fung, R. & Favero, B. D. (1994). Backward simulation in Bayesian networks. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.
- Galland, C. (1993). The limitations of deterministic Boltzmann machine learning. Network, 4, 355-379..
- Ghahramani, Z., & Hinton, G. E. (1996). Switching state-space models. University of
- Ghahramani, Z., & Jordan, M. I. (1997). Factorial Hidden Markov models. Machine
- Gilks, W., Thomas, A., & Spiegelhaiter, D. (1994). A language and a program for complex Bayesian modelling. The Statistician, 43, 169-178.
- Heckerman, D. (in press). A tutorial on learning with Bayesian networks. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.
- Henrion, M. (1991). Search-based methods to bound diagnostic probabilities in very large belief nets. Uncertainty and Artificial Intelligence: Proceedings of the Seventh
- Hinton, G. E., & Sejnowski, T. (1986). Learning and relearning in Boltzmann machines. In
- D. E. Rumelhart & J. L. McClelland, (Eds.), Parallel distributed processing: Volume
- Hinton, G.E. & van Camp, D. (1993). Keeping neural networks simple by minimizing the description length of the weights. In Proceedings of the 6th Annual Workshop on
- Hinton, G. E., Dayan, P., Frey, B., and Neal, R. M. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268:1158-116l.
- Hinton, G. E., Sallans, B., & Ghahramani, Z. (in press). A hierarchical community of experts. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer
- Horvitz, E. J., Suermondt, H. J., & Cooper, G.F. (1989). Bounded conditioning: Flexible inference for decisions under scarce resources. Conference on Uncertainty in Artificial
- Intelligence: Proceedings of the Fifth Conference. Mountain View, CA: Association for UAI.
- Jaakkola, T. S., & Jordan, M.1. (1996). Computing upper and lower bounds on likelihoods in intractable networks. Uncertainty and Artificial Intelligence: Proceedings of the
- Jaakkola, T. S. (1997). Variational methods for inference and estimation in graphical models. Unpublished doctoral dissertation, Massachusetts Institute of Technology.
- Jaakkola, T. S., & Jordan, M. I. (1997a). Recursive algorithms for approximating probabilities in graphical models. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems g. Cambridge, MA: MIT Press.
- Jaakkola, T. S., & Jordan, M. I. (1997b). Bayesian logistic regression: a variational approach. In D. Madigan & P. Smyth (Eds.), Proceedings of the 1997 Conference on
- Jaakkola, T. S., & Jordan. M.1. (1997c). Variational methods and the QMR-DT database.
- Submitted to: Journal of Artificial Intelligence Research.
- Jaakkola, T. S., & Jordan. M. I. (in press). Improving the mean field approximation via the use of mixture distributions. In M. I. Jordan (Ed.), Learning in Graphical Models.
- Jensen, C. S., Kong, A., & Kjrerulff, U. (1995). Blocking-Gibbs sampling in very large probabilistic expert systems. International Journal of Human-Computer Studies, 42, 647-666.
- Jensen, F. V., & Jensen, F. (1994). Optimal junction trees. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.
- Jensen, F. V. (1996). An Introduction to Bayesian Networks. London: UCL Press.
- Jordan, M. I. (1994). A statistical approach to decision tree modeling. In M. Warmuth (Ed.), Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory. New York: ACM Press.
- Jordan, M. I., Ghahramani, Z., & Saul, L. K. (1997). Hidden Markov decision trees. In
- M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information
- Kanazawa, K., Koller, D., & RusselJ, S. (1995). Stochastic simulation algorithms for dynamic probabilistic networks. Uncertainty and Artificial Intelligence: Proceedings of the Eleventh Conference. San Mateo, CA: Morgan Kaufmann.
- Kjrerulff, U. (1990). Triangulation of graphs-algorithms giving small total state space.
- Kjrerulff, U. (1994). Reduction of computational complexity in Bayesian networks through removal of weak dependences. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.
- MacKay, D.J.C. (1997a). Ensemble learning for hidden Markov models. Unpublished manuscript. Department of Physics, University of Cambridge.
- MacKay, D.J.C. (1997b). Comparison of approximate methods for handling hyperparameters. Submitted to Neural Computation.
- MacKay, D.J.C. (1997b). Introduction to Monte Carlo methods. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.
- McEliece, R.J., MacKay, D.J.C., & Cheng, J.-F. (1996) Turbo decoding as an instance of Pearl's "belief propagation algorithm." Submitted to: IEEE Journal on Selected Areas in Communication.
- Merz, C. J., & Murphy, P. M. (1996). UCI repository of machine learning databases. [http:/wwv.ics.uci/......mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
- Neal, R. (1992). Connectionist learning of belief networks, Artificial Intelligence, 56, 71113.
- Neal, R. (1993). Probabilistic inference using Markov chain Monte Carlo methods. University of Toronto Technical Report CRG-TR-93-1, Department of Computer Science.
- Neal, R., & Hinton, G. E. (in press). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.
- Parisi, G. (1988). Statistical Field Theory. Redwood City, CA: Addison-Wesley.
- Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, San Mateo, CA: Morgan Kaufmannn.
- Peterson, C., & Anderson, J. R. (1987). A mean field theory learning algorithm for neural networks. Complex Systems, 1,995-1019.
- Rockafellar, R. (1972). Convex Analysis. Princeton University Press.
- Rustagi, J. (1976). Variational Methods in Statistics. New York: Academic Press.
- Sakurai, J. (1985). Modem Quantum Mechanics. Redwood City, CA: Addison-Wesley.
- Saul, L. K., & Jordan, M. I. (1994). Learning in Boltzmann trees. Neural Computation, 6, 1173-1183.
- Saul, L. K., Jaakkola, T. S., & Jordan, M. I. (1996). Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 4, 61-'~6.
- Saul, L. K., & Jordan, M. I. (1996). Exploiting tractable substructures in intractable networks. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8.
- Saul, L. K., & Jordan, M. I. (in press). A mean field learning algorithm for unsupervised neural networks. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.
- Seung, S. (1995). Annealed theories oflearning. In J.-H Oh, C. Kwon, and S. Cho, (Eds.), Neural Networks: The Statistical Mechanics Perspectives. Singapore: World Scientific.
- Shachter, R. D., Andersen, S. K., & Szolovits, P. (1994). Global conditioning for probabilistic inference in belief networks. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.
- Shenoy, P. P. (1992). Valuation-based systems for Bayesian decision analysis. Operations Research, 40,463-484.
- Shwe, M. A., Middleton, B., Heckerman, D. E., Henrion, M., Horvitz, E. J., Lehmann, H. P., & Cooper, G. F. (1991). Probabilistic diagnosis using a reformulation of the INTERNIST-l/QMR lmowledge base. Meth. Inform. Med., 90, 241-255.
- Smyth, P., Heckerman, D., & Jordan, M. I. (1997). Probabilistic independence networks for hidden Markov probability models. Neural Computation, 9, 227-270.
- Waterhouse, S., MacKay, D.J.C. & Robinson, T. (1996). Bayesian methods for mixtures of experts. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8.
- Williams, C. K. I., & Hinton, G. E. (1991). Mean field networks that learn to discriminate temporally distorted strings. In Touretzky, D. S., Elman, J., Sejnowski, T., & Hinton, G. E., (Eds.), Proceedings of the 1990 Connectionist Models Summer School. San Mateo, CA: Morgan Kaufmann.
标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn