## AI 生成解读视频

AI抽取解析论文重点内容自动生成视频

AI解析本论文相关学术脉络

## AI 精读

AI抽取本论文的概要总结

We have described a variety of applications of variational methods to problems of inference and learning in graphical models

# An introduction to variational methods for graphical models

Learning in graphical models, no. 2 (1999): 183-233

EI

This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Mar...更多

0

• The problem of probabilistic inference in graphical models is the problem of computing a conditional probability distribution over the values of some of the nodes, given the values of other nodes.
• The authors often wish to calculate marginal probabilities in graphical models, in particular the probability of the observed evidence, P(E).
• Inference algorithms do not compute the numerator and denominator of Eq (1) and divide, they generally produce the likelihood as a by-product of the calculation of P(HIE).
• Algorithms that maximize likelihood generally make use of the calculation of P(HIE) as a subroutine

• The problem of probabilistic inference in graphical models is the problem of computing a conditional probability distribution over the values of some of the nodes, given the values of other nodes
• Viewed as a function of the parameters of the graphical model, for fixed E, P(E) is an important quantity known as the likelihood
• We provide a brief overview of the QMR-DT database here; for further details see Shwe, et al (1991)
• As in the case of the Boltzmann machine, we find that the variational parameters are linked via their Markov blankets and the consistency equation (Eq (67)) can be interpreted as a local message-passing algorithm
• We have described a variety of applications of variational methods to problems of inference and learning in graphical models
• It is important to emphasize, that research on variational methods for graphical models is of quite recent origin, and there are many open problems and unresolved issues

• The authors make a few remarks on the relationships between variational methods and stochastic methods, in particular the Gibbs sampler.
• In Gibbs sampling, the message-passing is simple: each node learns the current instantiation of its Markov blanket.
• With enough samples the node can estimate the distribution over its Markov blanket and determine its own statistics.
• The authors can quite generally treat parameters as additional nodes in a graphical model (cf.
• This volume) and thereby treat Bayesian inference on the same footing as generic probabilistic inference in a graphical model
• This probabilistic inference problem is often intractable, and variational approximations can be useful.
• The ensemble is fit by minimizing the appropriate KL divergence:

• The authors have described a variety of applications of variational methods to problems of inference and learning in graphical models.
• The authors hope to have convinced the reader that variational methods can provide a powerful and elegant tool for graphical models, and that the algorithms that result are simple and intuitively appealing.
• It is important to emphasize, that research on variational methods for graphical models is of quite recent origin, and there are many open problems and unresolved issues.

• Bathe, K. J. (1996). Finite Element Procedures. Englewood Cliffs, NJ: Prentice-Hall.
• Baum, L.E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41, 164-171.
• Cover, T., & Thomas, J. (1991). Elements of Information Theory. New York: John Wiley.
• Cowell, R. (in press). Introduction to inference for Bayesian networks. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.
• Dagum, P., & Luby, M. (1993). Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artificial Intelligence, 60, 141-153.
• Dayan, P., Hinton, G. E., Neal, R., & Zemel, R. S. (1995). The Helmholtz Machine.
• Dean, T., & Kanazawa, K. (1989). A model for reasoning about causality and persistence.
• Dechter, R. (in press). Bucket elimination: A unifying framework for probabilistic inference. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer
• Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B39, 1-38.
• Draper, D. L., & Hanks, S. (1994). Localized partial evaluation of belief networks. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.
• Frey, B. Hinton, G. E., Dayan, P. (1996). Does the wake-sleep algorithm learn good density estimators? In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8.
• Fung, R. & Favero, B. D. (1994). Backward simulation in Bayesian networks. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.
• Galland, C. (1993). The limitations of deterministic Boltzmann machine learning. Network, 4, 355-379..
• Ghahramani, Z., & Hinton, G. E. (1996). Switching state-space models. University of
• Ghahramani, Z., & Jordan, M. I. (1997). Factorial Hidden Markov models. Machine
• Gilks, W., Thomas, A., & Spiegelhaiter, D. (1994). A language and a program for complex Bayesian modelling. The Statistician, 43, 169-178.
• Heckerman, D. (in press). A tutorial on learning with Bayesian networks. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.
• Henrion, M. (1991). Search-based methods to bound diagnostic probabilities in very large belief nets. Uncertainty and Artificial Intelligence: Proceedings of the Seventh
• Hinton, G. E., & Sejnowski, T. (1986). Learning and relearning in Boltzmann machines. In
• D. E. Rumelhart & J. L. McClelland, (Eds.), Parallel distributed processing: Volume
• Hinton, G.E. & van Camp, D. (1993). Keeping neural networks simple by minimizing the description length of the weights. In Proceedings of the 6th Annual Workshop on
• Hinton, G. E., Dayan, P., Frey, B., and Neal, R. M. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268:1158-116l.
• Hinton, G. E., Sallans, B., & Ghahramani, Z. (in press). A hierarchical community of experts. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer
• Horvitz, E. J., Suermondt, H. J., & Cooper, G.F. (1989). Bounded conditioning: Flexible inference for decisions under scarce resources. Conference on Uncertainty in Artificial
• Intelligence: Proceedings of the Fifth Conference. Mountain View, CA: Association for UAI.
• Jaakkola, T. S., & Jordan, M.1. (1996). Computing upper and lower bounds on likelihoods in intractable networks. Uncertainty and Artificial Intelligence: Proceedings of the
• Jaakkola, T. S. (1997). Variational methods for inference and estimation in graphical models. Unpublished doctoral dissertation, Massachusetts Institute of Technology.
• Jaakkola, T. S., & Jordan, M. I. (1997a). Recursive algorithms for approximating probabilities in graphical models. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems g. Cambridge, MA: MIT Press.
• Jaakkola, T. S., & Jordan, M. I. (1997b). Bayesian logistic regression: a variational approach. In D. Madigan & P. Smyth (Eds.), Proceedings of the 1997 Conference on
• Jaakkola, T. S., & Jordan. M.1. (1997c). Variational methods and the QMR-DT database.
• Submitted to: Journal of Artificial Intelligence Research.
• Jaakkola, T. S., & Jordan. M. I. (in press). Improving the mean field approximation via the use of mixture distributions. In M. I. Jordan (Ed.), Learning in Graphical Models.
• Jensen, C. S., Kong, A., & Kjrerulff, U. (1995). Blocking-Gibbs sampling in very large probabilistic expert systems. International Journal of Human-Computer Studies, 42, 647-666.
• Jensen, F. V., & Jensen, F. (1994). Optimal junction trees. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.
• Jensen, F. V. (1996). An Introduction to Bayesian Networks. London: UCL Press.
• Jordan, M. I. (1994). A statistical approach to decision tree modeling. In M. Warmuth (Ed.), Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory. New York: ACM Press.
• Jordan, M. I., Ghahramani, Z., & Saul, L. K. (1997). Hidden Markov decision trees. In
• M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information
• Kanazawa, K., Koller, D., & RusselJ, S. (1995). Stochastic simulation algorithms for dynamic probabilistic networks. Uncertainty and Artificial Intelligence: Proceedings of the Eleventh Conference. San Mateo, CA: Morgan Kaufmann.
• Kjrerulff, U. (1990). Triangulation of graphs-algorithms giving small total state space.
• Kjrerulff, U. (1994). Reduction of computational complexity in Bayesian networks through removal of weak dependences. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.
• MacKay, D.J.C. (1997a). Ensemble learning for hidden Markov models. Unpublished manuscript. Department of Physics, University of Cambridge.
• MacKay, D.J.C. (1997b). Comparison of approximate methods for handling hyperparameters. Submitted to Neural Computation.
• MacKay, D.J.C. (1997b). Introduction to Monte Carlo methods. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.
• McEliece, R.J., MacKay, D.J.C., & Cheng, J.-F. (1996) Turbo decoding as an instance of Pearl's "belief propagation algorithm." Submitted to: IEEE Journal on Selected Areas in Communication.
• Merz, C. J., & Murphy, P. M. (1996). UCI repository of machine learning databases. [http:/wwv.ics.uci/......mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
• Neal, R. (1992). Connectionist learning of belief networks, Artificial Intelligence, 56, 71113.
• Neal, R. (1993). Probabilistic inference using Markov chain Monte Carlo methods. University of Toronto Technical Report CRG-TR-93-1, Department of Computer Science.
• Neal, R., & Hinton, G. E. (in press). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.
• Parisi, G. (1988). Statistical Field Theory. Redwood City, CA: Addison-Wesley.
• Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, San Mateo, CA: Morgan Kaufmannn.
• Peterson, C., & Anderson, J. R. (1987). A mean field theory learning algorithm for neural networks. Complex Systems, 1,995-1019.
• Rockafellar, R. (1972). Convex Analysis. Princeton University Press.
• Rustagi, J. (1976). Variational Methods in Statistics. New York: Academic Press.
• Sakurai, J. (1985). Modem Quantum Mechanics. Redwood City, CA: Addison-Wesley.
• Saul, L. K., & Jordan, M. I. (1994). Learning in Boltzmann trees. Neural Computation, 6, 1173-1183.
• Saul, L. K., Jaakkola, T. S., & Jordan, M. I. (1996). Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 4, 61-'~6.
• Saul, L. K., & Jordan, M. I. (1996). Exploiting tractable substructures in intractable networks. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8.
• Saul, L. K., & Jordan, M. I. (in press). A mean field learning algorithm for unsupervised neural networks. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.
• Seung, S. (1995). Annealed theories oflearning. In J.-H Oh, C. Kwon, and S. Cho, (Eds.), Neural Networks: The Statistical Mechanics Perspectives. Singapore: World Scientific.
• Shachter, R. D., Andersen, S. K., & Szolovits, P. (1994). Global conditioning for probabilistic inference in belief networks. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.
• Shenoy, P. P. (1992). Valuation-based systems for Bayesian decision analysis. Operations Research, 40,463-484.
• Shwe, M. A., Middleton, B., Heckerman, D. E., Henrion, M., Horvitz, E. J., Lehmann, H. P., & Cooper, G. F. (1991). Probabilistic diagnosis using a reformulation of the INTERNIST-l/QMR lmowledge base. Meth. Inform. Med., 90, 241-255.
• Smyth, P., Heckerman, D., & Jordan, M. I. (1997). Probabilistic independence networks for hidden Markov probability models. Neural Computation, 9, 227-270.
• Waterhouse, S., MacKay, D.J.C. & Robinson, T. (1996). Bayesian methods for mixtures of experts. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8.
• Williams, C. K. I., & Hinton, G. E. (1991). Mean field networks that learn to discriminate temporally distorted strings. In Touretzky, D. S., Elman, J., Sejnowski, T., & Hinton, G. E., (Eds.), Proceedings of the 1990 Connectionist Models Summer School. San Mateo, CA: Morgan Kaufmann.
0