# Dependency networks for inference, collaborative filtering, and data visualization

Journal of Machine Learning Research, pp. 49-75, 2001.

EI

关键词：

conditional distributionbasic propertyprobabilistic relationshipdependency networkbayesian network更多(8+)

微博一下：

摘要：

We describe a graphical model for probabilistic relationships--an alternative to the Bayesian network--called a dependency network. The graph of a dependency network, unlike a Bayesian network, is potentially cyclic. The probability component of a dependency network, like a Bayesian network, is a set of conditional distributions, one for ...更多

代码：

数据：

简介

- The Bayesian network has proven to be a valuable tool for encoding, learning, and reasoning about probabilistic relationships.
- The authors introduce another graphical representation of such relationships called a dependency network.
- A dependency network is not useful for encoding causal relationships and is di cult to construct using a knowledge-based approach.
- There are straightforward and computationally e cient algorithms for learning both the structure and probabilities of a dependency network from data; and the learned model is quite useful for encoding and displaying predictive i.e., dependence and independence relationships.
- Dependency networks are well suited to the task of predicting preferences|a task often referred to as collaborative ltering|and are generally useful for probabilistic inference, the task of answering probabilistic queries

重点内容

- The Bayesian network has proven to be a valuable tool for encoding, learning, and reasoning about probabilistic relationships
- We have described a graphical representation for probabilistic dependencies similar to the Bayesian network called a dependency network
- As in a Bayesian network, the probability component consists of the probability of a node given its parents for each node|the local distributions
- The parameterized model for each variable is the local distribution for that variable; and the structure of the network re ects any independencies discovered in the classi cation regression process via feature selection
- As a result of this learning procedure, the dependency network is usually inconsistent|that is, it is not the case that the local distributions can be obtained via inference from a single joint distribution for the domain
- Experiments on real data show this approach to yield accurate predictions. In addition to their application to probabilistic inference, we have shown that dependency networks are useful for collaborative ltering the task of predicting preferences and for the visualization of acausal predictive relationships

结果

- The authors measured the accuracy of recommendation lists produced by a Bayesian network with no arcs baseline model.
- This model recommends items based on their overall popularity, pxi = 1.
- A score in boldface corresponds to a statistically signi cant winner.
- The authors use ANOVA e.g., McClave and Dietrich, 1988 with = 0:1 to test for statistical signi cance.
- When the di erence between two scores in the same column exceed the value of RD required di erence, the di erence is signi cant

结论

**Summary and Future Work**

The authors have described a graphical representation for probabilistic dependencies similar to the Bayesian network called a dependency network.- As in a Bayesian network, the probability component consists of the probability of a node given its parents for each node|the local distributions.
- For computational reasons, the authors learn the structure and parameters of a dependency network for a given domain by independently performing a classi cation regression for each variable in the domain with inputs consisting of all variables except the target variable.
- Experiments on real data show this approach to yield accurate predictions

总结

## Introduction:

The Bayesian network has proven to be a valuable tool for encoding, learning, and reasoning about probabilistic relationships.- The authors introduce another graphical representation of such relationships called a dependency network.
- A dependency network is not useful for encoding causal relationships and is di cult to construct using a knowledge-based approach.
- There are straightforward and computationally e cient algorithms for learning both the structure and probabilities of a dependency network from data; and the learned model is quite useful for encoding and displaying predictive i.e., dependence and independence relationships.
- Dependency networks are well suited to the task of predicting preferences|a task often referred to as collaborative ltering|and are generally useful for probabilistic inference, the task of answering probabilistic queries
## Results:

The authors measured the accuracy of recommendation lists produced by a Bayesian network with no arcs baseline model.- This model recommends items based on their overall popularity, pxi = 1.
- A score in boldface corresponds to a statistically signi cant winner.
- The authors use ANOVA e.g., McClave and Dietrich, 1988 with = 0:1 to test for statistical signi cance.
- When the di erence between two scores in the same column exceed the value of RD required di erence, the di erence is signi cant
## Conclusion:

**Summary and Future Work**

The authors have described a graphical representation for probabilistic dependencies similar to the Bayesian network called a dependency network.- As in a Bayesian network, the probability component consists of the probability of a node given its parents for each node|the local distributions.
- For computational reasons, the authors learn the structure and parameters of a dependency network for a given domain by independently performing a classi cation regression for each variable in the domain with inputs consisting of all variables except the target variable.
- Experiments on real data show this approach to yield accurate predictions

- Table1: Details for datasets and Score bits per observation for a Bayesian network BN, dependency network DN, and baseline model BL applied to these datasets. The lower the Score, the higher the accuracy of the learned model
- Table2: Number of users, items, and items per user for the datasets used in evaluating the algorithms
- Table3: CF accuracy for the MS.COM, Nielsen, and MSNBC datasets. Higher scores indicate better performance. Statistically signi cant winners are shown in boldface
- Table4: Number of predictions per second for the MS.COM, Nielsen, and MSNBC datasets
- Table5: Computational resources for model learning

相关工作

- Before we consider new applications of dependency networks, we review related work on the basic concepts. As we have already mentioned, several researchers who developed Markov networks began with an examination of what we call consistent dependency networks. For an excellent discussion of this development as well as original contributions in this area, see Besag 1974. Besag 1975 also described an approach called pseudo-likelihood estimation, in which the conditionals are learned directly|as in our approach|without respecting the consistency constraints. We use the name pseudo-Gibbs sampling to make a connection to his work. Tresp and Hofmann 1998 describe general dependency networks, calling them Markov blanket networks. They stated and proved Theorem 3, and evaluated the predictive accuracy of the representation on several data sets using local distributions consisting of conditional Parzen windows.

引用论文

- Bartlett, M. 1955. An Introduction to Stochastic Processes. University Press, Cambridge.
- Besag, J. 1974. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, B, 36, 192 236.
- Besag, J. 1975. Statistical analysis of non-lattice data. The Statistician, 24, 179 195.
- Besag, J., Green, P., Higdon, D., & Mengersen, K. 1995. Bayesian computation and stochastic systems. Statistical Science, 10, 3 66.
- Bishop, C. 199Neural Networks for Pattern Recognition. Clarendon Press, Oxford.
- Breese, J. S., Heckerman, D., & Kadie, C. 1998. Empirical analysis of predictive algorithms for collaborative ltering. In Proceedings of Fourteenth Conference on Uncertainty in Arti cial Intelligence, Madison, Wisconsin. Morgan Kaufmann.
- Brook, D. 1964. On the distinction between the conditional probability and the joint probability approaches in the speci cation of nearest-neighbor systems. Biometrika, 51, 481 483.
- Buntine, W. 1991. Theory re nement on Bayesian networks. In Proceedings of Seventh Conference on Uncertainty in Arti cial Intelligence, Los Angeles, CA, pp. 52 60. Morgan Kaufmann.
- Chickering, D., Heckerman, D., & Meek, C. 1997. A Bayesian approach to learning Bayesian networks with local structure. In Proceedings of Thirteenth Conference on Uncertainty in Arti cial Intelligence, Providence, RI. Morgan Kaufmann.
- Cho, G., & Meyer, C. 1999. Markov chain sensitivity by mean rst passage times. Tech. rep. 112242-0199, North Carolina State University.
- Fowlkes, E., Freeny, A., & Landwehr, J. 1988. Evaluating logistic models for large contingency tables. Journal of the American Statistical Association, 83, 611 622.
- Frey, B., Hinton, G., & Dayan, P. 1996. Does the wake-sleep algorithm produce good density estimators?. In Touretsky, D., Mozer, M., & Hasselmo, M. Eds., Neural Information Processing Systems, Vol. 8, pp. 661 667. MIT Press.
- Friedman, N., & Goldszmidt, M. 1996. Learning Bayesian networks with local structure. In Proceedings of Twelfth Conference on Uncertainty in Arti cial Intelligence, Portland, OR, pp. 252 262. Morgan Kaufmann.
- Geman, S., & Geman, D. 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721 742.
- Gilks, W., Richardson, S., & Spiegelhalter, D. 1996. Markov Chain Monte Carlo in Practice. Chapman and Hall.
- Heckerman, D., & Meek, C. 1997. Models and selection criteria for regression and classication. In Proceedings of Thirteenth Conference on Uncertainty in Arti cial Intelligence, Providence, RI. Morgan Kaufmann.
- Jensen, F., Lauritzen, S., & Olesen, K. 1990. Bayesian updating in recursive graphical models by local computations. Computational Statisticals Quarterly, 4, 269 282.
- Lauritzen, S. 1996. Graphical Models. Claredon Press.
- Lauritzen, S., Dawid, A., Larsen, B., & Leimer, H. 1990. Independence properties of directed Markov elds. Networks, 20, 491 505. L evy, P. 1948. Chaines doubles de Marko et fonctions aleatories de deux variables.
- McClave, J., & Dieterich, F. 1988. Statistics. Dellen Publishing Company.
- McCullagh, P., & Nelder, J. 1989. Generalized Linear Models, Second Edition. Chapman and Hall, New York. Neal, R. 1993. Probabilistic inference using Markov chain Monte Carlo methods. Tech.
- Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
- Inference. Morgan Kaufmann, San Mateo, CA. Platt, J. 1999. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods|Support Vector Learning. MIT Press.
- Resnik, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. 1994. Grouplens: An open architecture for collaborative ltering of netnews. In Proceedings of the ACM 1994 Conference on Computer Supported Cooperative Work, pp. 175 186. ACM. Sewell, W., & Shah, V. 1968. Social class, parental encouragement, and educational aspirations. American Journal of Sociology, 73, 559 572.
- Tresp, V., & Hofmann, R. 1998. Nonlinear Markov networks for continuous variables. In Advances in Neural Information Processing Systems 10, pp. 521 527. MIT Press.
- Whittaker, J. 1990. Graphical Models in Applied Multivariate Statistics. John Wiley and Sons.

PDF 全文

标签

评论