Dependency networks for inference, collaborative filtering, and data visualization

Journal of Machine Learning Research, pp. 49-75, 2001.

被引用606|引用|浏览37|
EI
其它链接dblp.uni-trier.de|dl.acm.org|academic.microsoft.com
关键词
conditional distributionbasic propertyprobabilistic relationshipdependency networkbayesian network更多(8+)
微博一下
The parameterized model for each variable is the local distribution for that variable; and the structure of the network re ects any independencies discovered in the classi cation regression process via feature selection

摘要

We describe a graphical model for probabilistic relationships--an alternative to the Bayesian network--called a dependency network. The graph of a dependency network, unlike a Bayesian network, is potentially cyclic. The probability component of a dependency network, like a Bayesian network, is a set of conditional distributions, one for ...更多

代码

数据

0
简介
  • The Bayesian network has proven to be a valuable tool for encoding, learning, and reasoning about probabilistic relationships.
  • The authors introduce another graphical representation of such relationships called a dependency network.
  • A dependency network is not useful for encoding causal relationships and is di cult to construct using a knowledge-based approach.
  • There are straightforward and computationally e cient algorithms for learning both the structure and probabilities of a dependency network from data; and the learned model is quite useful for encoding and displaying predictive i.e., dependence and independence relationships.
  • Dependency networks are well suited to the task of predicting preferences|a task often referred to as collaborative ltering|and are generally useful for probabilistic inference, the task of answering probabilistic queries
重点内容
  • The Bayesian network has proven to be a valuable tool for encoding, learning, and reasoning about probabilistic relationships
  • We have described a graphical representation for probabilistic dependencies similar to the Bayesian network called a dependency network
  • As in a Bayesian network, the probability component consists of the probability of a node given its parents for each node|the local distributions
  • The parameterized model for each variable is the local distribution for that variable; and the structure of the network re ects any independencies discovered in the classi cation regression process via feature selection
  • As a result of this learning procedure, the dependency network is usually inconsistent|that is, it is not the case that the local distributions can be obtained via inference from a single joint distribution for the domain
  • Experiments on real data show this approach to yield accurate predictions. In addition to their application to probabilistic inference, we have shown that dependency networks are useful for collaborative ltering the task of predicting preferences and for the visualization of acausal predictive relationships
结果
  • The authors measured the accuracy of recommendation lists produced by a Bayesian network with no arcs baseline model.
  • This model recommends items based on their overall popularity, pxi = 1.
  • A score in boldface corresponds to a statistically signi cant winner.
  • The authors use ANOVA e.g., McClave and Dietrich, 1988 with = 0:1 to test for statistical signi cance.
  • When the di erence between two scores in the same column exceed the value of RD required di erence, the di erence is signi cant
结论
  • Summary and Future Work

    The authors have described a graphical representation for probabilistic dependencies similar to the Bayesian network called a dependency network.
  • As in a Bayesian network, the probability component consists of the probability of a node given its parents for each node|the local distributions.
  • For computational reasons, the authors learn the structure and parameters of a dependency network for a given domain by independently performing a classi cation regression for each variable in the domain with inputs consisting of all variables except the target variable.
  • Experiments on real data show this approach to yield accurate predictions
总结
  • Introduction:

    The Bayesian network has proven to be a valuable tool for encoding, learning, and reasoning about probabilistic relationships.
  • The authors introduce another graphical representation of such relationships called a dependency network.
  • A dependency network is not useful for encoding causal relationships and is di cult to construct using a knowledge-based approach.
  • There are straightforward and computationally e cient algorithms for learning both the structure and probabilities of a dependency network from data; and the learned model is quite useful for encoding and displaying predictive i.e., dependence and independence relationships.
  • Dependency networks are well suited to the task of predicting preferences|a task often referred to as collaborative ltering|and are generally useful for probabilistic inference, the task of answering probabilistic queries
  • Results:

    The authors measured the accuracy of recommendation lists produced by a Bayesian network with no arcs baseline model.
  • This model recommends items based on their overall popularity, pxi = 1.
  • A score in boldface corresponds to a statistically signi cant winner.
  • The authors use ANOVA e.g., McClave and Dietrich, 1988 with = 0:1 to test for statistical signi cance.
  • When the di erence between two scores in the same column exceed the value of RD required di erence, the di erence is signi cant
  • Conclusion:

    Summary and Future Work

    The authors have described a graphical representation for probabilistic dependencies similar to the Bayesian network called a dependency network.
  • As in a Bayesian network, the probability component consists of the probability of a node given its parents for each node|the local distributions.
  • For computational reasons, the authors learn the structure and parameters of a dependency network for a given domain by independently performing a classi cation regression for each variable in the domain with inputs consisting of all variables except the target variable.
  • Experiments on real data show this approach to yield accurate predictions
表格
  • Table1: Details for datasets and Score bits per observation for a Bayesian network BN, dependency network DN, and baseline model BL applied to these datasets. The lower the Score, the higher the accuracy of the learned model
  • Table2: Number of users, items, and items per user for the datasets used in evaluating the algorithms
  • Table3: CF accuracy for the MS.COM, Nielsen, and MSNBC datasets. Higher scores indicate better performance. Statistically signi cant winners are shown in boldface
  • Table4: Number of predictions per second for the MS.COM, Nielsen, and MSNBC datasets
  • Table5: Computational resources for model learning
Download tables as Excel
相关工作
  • Before we consider new applications of dependency networks, we review related work on the basic concepts. As we have already mentioned, several researchers who developed Markov networks began with an examination of what we call consistent dependency networks. For an excellent discussion of this development as well as original contributions in this area, see Besag 1974. Besag 1975 also described an approach called pseudo-likelihood estimation, in which the conditionals are learned directly|as in our approach|without respecting the consistency constraints. We use the name pseudo-Gibbs sampling to make a connection to his work. Tresp and Hofmann 1998 describe general dependency networks, calling them Markov blanket networks. They stated and proved Theorem 3, and evaluated the predictive accuracy of the representation on several data sets using local distributions consisting of conditional Parzen windows.
引用论文
  • Bartlett, M. 1955. An Introduction to Stochastic Processes. University Press, Cambridge.
    Google ScholarFindings
  • Besag, J. 1974. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, B, 36, 192 236.
    Google ScholarLocate open access versionFindings
  • Besag, J. 1975. Statistical analysis of non-lattice data. The Statistician, 24, 179 195.
    Google ScholarLocate open access versionFindings
  • Besag, J., Green, P., Higdon, D., & Mengersen, K. 1995. Bayesian computation and stochastic systems. Statistical Science, 10, 3 66.
    Google ScholarLocate open access versionFindings
  • Bishop, C. 199Neural Networks for Pattern Recognition. Clarendon Press, Oxford.
    Google ScholarFindings
  • Breese, J. S., Heckerman, D., & Kadie, C. 1998. Empirical analysis of predictive algorithms for collaborative ltering. In Proceedings of Fourteenth Conference on Uncertainty in Arti cial Intelligence, Madison, Wisconsin. Morgan Kaufmann.
    Google ScholarLocate open access versionFindings
  • Brook, D. 1964. On the distinction between the conditional probability and the joint probability approaches in the speci cation of nearest-neighbor systems. Biometrika, 51, 481 483.
    Google ScholarLocate open access versionFindings
  • Buntine, W. 1991. Theory re nement on Bayesian networks. In Proceedings of Seventh Conference on Uncertainty in Arti cial Intelligence, Los Angeles, CA, pp. 52 60. Morgan Kaufmann.
    Google ScholarLocate open access versionFindings
  • Chickering, D., Heckerman, D., & Meek, C. 1997. A Bayesian approach to learning Bayesian networks with local structure. In Proceedings of Thirteenth Conference on Uncertainty in Arti cial Intelligence, Providence, RI. Morgan Kaufmann.
    Google ScholarLocate open access versionFindings
  • Cho, G., & Meyer, C. 1999. Markov chain sensitivity by mean rst passage times. Tech. rep. 112242-0199, North Carolina State University.
    Google ScholarFindings
  • Fowlkes, E., Freeny, A., & Landwehr, J. 1988. Evaluating logistic models for large contingency tables. Journal of the American Statistical Association, 83, 611 622.
    Google ScholarLocate open access versionFindings
  • Frey, B., Hinton, G., & Dayan, P. 1996. Does the wake-sleep algorithm produce good density estimators?. In Touretsky, D., Mozer, M., & Hasselmo, M. Eds., Neural Information Processing Systems, Vol. 8, pp. 661 667. MIT Press.
    Google ScholarLocate open access versionFindings
  • Friedman, N., & Goldszmidt, M. 1996. Learning Bayesian networks with local structure. In Proceedings of Twelfth Conference on Uncertainty in Arti cial Intelligence, Portland, OR, pp. 252 262. Morgan Kaufmann.
    Google ScholarLocate open access versionFindings
  • Geman, S., & Geman, D. 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721 742.
    Google ScholarLocate open access versionFindings
  • Gilks, W., Richardson, S., & Spiegelhalter, D. 1996. Markov Chain Monte Carlo in Practice. Chapman and Hall.
    Google ScholarFindings
  • Heckerman, D., & Meek, C. 1997. Models and selection criteria for regression and classication. In Proceedings of Thirteenth Conference on Uncertainty in Arti cial Intelligence, Providence, RI. Morgan Kaufmann.
    Google ScholarLocate open access versionFindings
  • Jensen, F., Lauritzen, S., & Olesen, K. 1990. Bayesian updating in recursive graphical models by local computations. Computational Statisticals Quarterly, 4, 269 282.
    Google ScholarLocate open access versionFindings
  • Lauritzen, S. 1996. Graphical Models. Claredon Press.
    Google ScholarFindings
  • Lauritzen, S., Dawid, A., Larsen, B., & Leimer, H. 1990. Independence properties of directed Markov elds. Networks, 20, 491 505. L evy, P. 1948. Chaines doubles de Marko et fonctions aleatories de deux variables.
    Google ScholarLocate open access versionFindings
  • McClave, J., & Dieterich, F. 1988. Statistics. Dellen Publishing Company.
    Google ScholarFindings
  • McCullagh, P., & Nelder, J. 1989. Generalized Linear Models, Second Edition. Chapman and Hall, New York. Neal, R. 1993. Probabilistic inference using Markov chain Monte Carlo methods. Tech.
    Google ScholarFindings
  • Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
    Google ScholarLocate open access versionFindings
  • Inference. Morgan Kaufmann, San Mateo, CA. Platt, J. 1999. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods|Support Vector Learning. MIT Press.
    Google ScholarFindings
  • Resnik, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. 1994. Grouplens: An open architecture for collaborative ltering of netnews. In Proceedings of the ACM 1994 Conference on Computer Supported Cooperative Work, pp. 175 186. ACM. Sewell, W., & Shah, V. 1968. Social class, parental encouragement, and educational aspirations. American Journal of Sociology, 73, 559 572.
    Google ScholarLocate open access versionFindings
  • Tresp, V., & Hofmann, R. 1998. Nonlinear Markov networks for continuous variables. In Advances in Neural Information Processing Systems 10, pp. 521 527. MIT Press.
    Google ScholarLocate open access versionFindings
  • Whittaker, J. 1990. Graphical Models in Applied Multivariate Statistics. John Wiley and Sons.
    Google ScholarLocate open access versionFindings
PDF 全文
您的评分 :
0

 

标签
评论