Neural Datalog Through Time: Informed Temporal Modeling via Logical Specification

ICML 2020, 2020.

Cited by: 0|Bibtex|Views3|Links
Keywords:
neural Datalog through timepossible event typerecurrent neural networklong short term memorypast eventMore(9+)
Weibo:
In the RoboCup domain, we investigated how the model performance degrades if we remove each kind of rule from the neural Datalog through time model

Abstract:

Learning how to predict future events from patterns of past events is difficult when the set of possible event types is large. Training an unrestricted neural model might overfit to spurious patterns. To exploit domain-specific knowledge of how past events might affect an event's present probability, we propose using a temporal deductiv...More

Code:

Data:

0
Introduction
  • A common task is to predict the future from the past or to impute other missing events.
  • Often this is done by fitting a generative probability model.
  • A good starting point is the Hawkes process and its many variants, including neuralized versions based on LSTMs.
Highlights
  • Temporal sequences are abundant in applied machine learning
  • A good starting point is the Hawkes process and its many variants, including neuralized versions based on LSTMs
  • Our neural Datalog through time model’s moderate further improvement results from its richer :- and <- rules related to tags
  • In the RoboCup domain, we investigated how the model performance degrades if we remove each kind of rule from the neural Datalog through time model
  • We proposed an extended notation to support temporal deductive databases. “Neural Datalog through time” allows the facts, embeddings, and probabilities to change over time, both by gradual drift and in response to discrete events
Methods
  • In several continuous-time domains, the authors exhibit informed models specified using neural Datalog through time (NDTT).
  • The authors evaluate these models on their held-out loglikelihood, and on their success at predicting the time and type of the event.
  • The authors compare with the unrestricted neural Hawkes process (NHP) and with Know-Evolve (KE) and DyRep. Experimental details are given in Appendix F.
  • The authors' code and datasets are available at the URL given in §2
Results
  • The authors used minimum Bayes risk (§4) to predict events in test data.
  • Figure 2 shows that the NDTT model enjoys consistently lower error than strong competitors, across datasets and prediction tasks.
  • KE handles relational information, but doesn’t accommodate dynamic facts such as released(game of thrones) and has ball(red8) that reconfigure model architectures on the fly.
  • In the IPTV domain, DyRep handles dynamic facts and substantially outperforms KE.
  • The authors' NDTT model’s moderate further improvement results from its richer :- and <- rules related to tags
Conclusion
  • In the RoboCup domain, the reimplementation of DyRep allows deletion of facts, whereas the original DyRep only allowed addition of facts.
  • Even with this improvement, it performs much worse than the full NDTT model.
  • The authors obtained “NDTT-” by dropping the team states, and “DyRep++” by not tracking the ball possessor
  • The latter is still an enhancement to DyRep because it adds
Summary
  • Introduction:

    A common task is to predict the future from the past or to impute other missing events.
  • Often this is done by fitting a generative probability model.
  • A good starting point is the Hawkes process and its many variants, including neuralized versions based on LSTMs.
  • Methods:

    In several continuous-time domains, the authors exhibit informed models specified using neural Datalog through time (NDTT).
  • The authors evaluate these models on their held-out loglikelihood, and on their success at predicting the time and type of the event.
  • The authors compare with the unrestricted neural Hawkes process (NHP) and with Know-Evolve (KE) and DyRep. Experimental details are given in Appendix F.
  • The authors' code and datasets are available at the URL given in §2
  • Results:

    The authors used minimum Bayes risk (§4) to predict events in test data.
  • Figure 2 shows that the NDTT model enjoys consistently lower error than strong competitors, across datasets and prediction tasks.
  • KE handles relational information, but doesn’t accommodate dynamic facts such as released(game of thrones) and has ball(red8) that reconfigure model architectures on the fly.
  • In the IPTV domain, DyRep handles dynamic facts and substantially outperforms KE.
  • The authors' NDTT model’s moderate further improvement results from its richer :- and <- rules related to tags
  • Conclusion:

    In the RoboCup domain, the reimplementation of DyRep allows deletion of facts, whereas the original DyRep only allowed addition of facts.
  • Even with this improvement, it performs much worse than the full NDTT model.
  • The authors obtained “NDTT-” by dropping the team states, and “DyRep++” by not tracking the ball possessor
  • The latter is still an enhancement to DyRep because it adds
Tables
  • Table1: Statistics of each dataset
Download tables as Excel
Related work
  • Past work (Sato, 1995; Poole, 2010; Richardson & Domingos, 2006; Raedt et al, 2007; Barany et al, 2017) has used logic programs to help define probabilistic relational models (Getoor & Taskar, 2007). These models do not make use of vector-space embeddings or neural networks. Nor do they usually have a temporal component. However, some other (directed) graphical model formalisms do allow the model architecture to be affected by data generated at earlier steps (Minka & Winn, 2008; van de Meent et al, 2018).

    Our “neural Datalog through time” framework uses a deductive database augmented with update rules to define and dynamically reconfigure the architecture of a neural generative model. Conditional neural net structure has been used
Funding
  • Fellowship Award to the first author, and to the National Science Foundation for supporting the other JHU authors under Grant No 1718846
Reference
  • Acar, U. A. and Ley-Wild, R. Self-adjusting computation with Delta ML. In International School on Advanced Functional Programming, 2008.
    Google ScholarLocate open access versionFindings
  • Aldous, D., Ibragimov, I., Jacod, J., and Aldous, D. Exchangeability and related topics. In Ecole d’Etede Probabilites de Saint-Flour XIII — 1983, Lecture Notes in Mathematics. 1985.
    Google ScholarLocate open access versionFindings
  • Andreas, J., Rohrbach, M., Darrell, T., and Klein, D. Learning to compose neural networks for question answering. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2016.
    Google ScholarLocate open access versionFindings
  • Barany, V., ten Cate, B., Kimelfeld, B., Olteanu, D., and Vagena, Z. Declarative probabilistic programming with Datalog. ACM Transactions on Database Systems, 42 (4):22:1–35, October 2017.
    Google ScholarLocate open access versionFindings
  • Bhattacharjya, D., Subramanian, D., and Gao, T. Proximal graphical event models. In Advances in Neural Information Processing Systems (NeurIPS), pp. 8136–8145, 2018.
    Google ScholarLocate open access versionFindings
  • Blei, D. and Lafferty, J. Correlated topic models. In Advances in Neural Information Processing Systems (NeurIPS), volume 18, pp. 147–154, 2006.
    Google ScholarLocate open access versionFindings
  • Blei, D. M. and Frazier, P. I. Distance-dependent Chinese restaurant processes. In Proceedings of the International Conference on Machine Learning (ICML), pp. 87–94, 2010.
    Google ScholarLocate open access versionFindings
  • Carbonell, P., jcdouet, Alves, H. C., and Tim, A. pyDatalog, 2016.
    Google ScholarFindings
  • IEEE Transactions on Knowledge and Data Engineering, 1989.
    Google ScholarFindings
  • Chen, D. L. and Mooney, R. J. Learning to sportscast: A test of grounded language acquisition. In Proceedings of the International Conference on Machine Learning (ICML), 2008.
    Google ScholarLocate open access versionFindings
  • Dyer, C., Kuncoro, A., Ballesteros, M., and Smith, N. A. Recurrent neural network grammars. In NAACL, pp. 199–209, 2016.
    Google ScholarLocate open access versionFindings
  • Elman, J. L. Finding structure in time. Cognitive Science, 1990.
    Google ScholarLocate open access versionFindings
  • Filardo, N. W. and Eisner, J. A flexible solver for finite arithmetic circuits. In Technical Communications of the 28th International Conference on Logic Programming (ICLP), 2012.
    Google ScholarLocate open access versionFindings
  • Fisher, R. A., Corbet, A. S., and Williams, C. The relation between the number of species and the number of individuals in a random sample of an animal population. J. Animal Ecology, 12:42–58, 1943.
    Google ScholarLocate open access versionFindings
  • Getoor, L. and Taskar, B. (eds.). Introduction to Statistical Relational Learning. MIT Press, 2007.
    Google ScholarFindings
  • Goller, C. and Kuchler, A. Learning task-dependent distributed representations by backpropagation through structure. In IEEE International Conference on Neural Networks, volume 1, pp. 347–352, 1996.
    Google ScholarLocate open access versionFindings
  • Graves, A., Wayne, G., and Danihelka, I. Neural Turing machines. arXiv preprint arXiv:1410.5401, 2014.
    Findings
  • Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwinska, A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., et al. Hybrid computing using a neural network with dynamic external memory. Nature, 2016.
    Google ScholarLocate open access versionFindings
  • Hamilton, W., Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems (NeurIPS), 2017a.
    Google ScholarLocate open access versionFindings
  • Hamilton, W. L., Ying, R., and Leskovec, J. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584, 2017b.
    Findings
  • Hammer, M. A. Self-Adjusting Machines. PhD thesis, Computer Science Department, University of Chicago, 2012.
    Google ScholarFindings
  • Hawkes, A. G. Spectra of some self-exciting and mutually exciting point processes. Biometrika, 1971.
    Google ScholarLocate open access versionFindings
  • Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural Computation, 1997.
    Google ScholarLocate open access versionFindings
  • Kiddon, C., Zettlemoyer, L., and Choi, Y. Globally coherent text generation with neural checklist models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
    Google ScholarLocate open access versionFindings
  • Kingma, D. and Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • Kumar, A., Irsoy, O., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I., Zhong, V., Paulus, R., and Socher, R. Ask me anything: Dynamic memory networks for natural language processing. In Proceedings of the International Conference on Machine Learning (ICML), 2016.
    Google ScholarLocate open access versionFindings
  • Lample, G., Sablayrolles, A., Ranzato, M., Denoyer, L., and Jegou, H. Large memory layers with product keys. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
    Google ScholarLocate open access versionFindings
  • Le, P. and Zuidema, W. The forest convolutional network: Compositional distributional semantics with a neural chart and without binarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015.
    Google ScholarLocate open access versionFindings
  • Lin, C., Zhu, H., Gormley, M. R., and Eisner, J. M. Neural finite-state transducers: Beyond rational relations. In NAACL-HLT, pp. 272–283, 2019.
    Google ScholarLocate open access versionFindings
  • Lin, C.-C. and Eisner, J. Neural particle smoothing for sampling from conditional sequence models. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2018.
    Google ScholarLocate open access versionFindings
  • Ling, W., Luıs, T., Marujo, L., Astudillo, R. F., Amir, S., Dyer, C., Black, A. W., and Trancoso, I. Finding function in form: Compositional character models for open vocabulary word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015.
    Google ScholarLocate open access versionFindings
  • Liniger, T. J. Multivariate Hawkes processes. Diss., Eidgenossische Technische Hochschule ETH Zurich, Nr. 18403, 2009.
    Google ScholarFindings
  • Meek, C. Toward learning graphical and causal process models. In Uncertainty in Artificial Intelligence Workshop on Causal Inference: Learning and Prediction, volume 1274, pp. 43–48, 2014.
    Google ScholarLocate open access versionFindings
  • Mei, H. and Eisner, J. The neural Hawkes process: A neurally self-modulating multivariate point process. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Mei, H., Qin, G., and Eisner, J. Imputing missing events in continuous-time event streams. In Proceedings of the International Conference on Machine Learning (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., and Khudanpur, S. Recurrent neural network-based language model. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2010.
    Google ScholarLocate open access versionFindings
  • Minka, T. and Winn, J. Gates: A graphical notation for mixture models. In Advances in Neural Information Processing Systems (NeurIPS), pp. 1073–1080, 2008.
    Google ScholarLocate open access versionFindings
  • Natarajan, S., Bui, H. H., Tadepalli, P., Kersting, K., and Wong, W.-K. Logical hierarchical hidden Markov models for modeling user activities. In Proceedings of the International Conference on Inductive Logic Programming (ICILP), 2008.
    Google ScholarLocate open access versionFindings
  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. Automatic differentiation in PyTorch. 2017.
    Google ScholarFindings
  • Poole, D. AILog user manual, version 2.3, 2010.
    Google ScholarFindings
  • Raedt, L. D., Kimmig, A., and Toivonen, H. Problog: A probabilistic Prolog and its application in link discovery. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 2462–2467, 2007.
    Google ScholarLocate open access versionFindings
  • Richardson, M. and Domingos, P. Markov logic networks. Machine Learning, 2006.
    Google ScholarLocate open access versionFindings
  • Sato, T. A statistical learning method for logic programs with distribution semantics. In Proceedings of the International Conference on Logic Programming (ICLP), pp. 715–729, 1995.
    Google ScholarLocate open access versionFindings
  • Shelton, C. R. and Ciardo, G. Tutorial on structured continuous-time Markov processes. Journal of Artificial Intelligence Research, 51:725–778, 2014.
    Google ScholarLocate open access versionFindings
  • Socher, R., Huval, B., Manning, C. D., and Ng, A. Y. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012.
    Google ScholarLocate open access versionFindings
  • Sukhbaatar, S., Weston, J., Fergus, R., et al. End-to-end memory networks. In Advances in Neural Information Processing Systems (NeurIPS), 2015.
    Google ScholarLocate open access versionFindings
  • Sundermeyer, M., Ney, H., and Schluter, R. LSTM neural networks for language modeling. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2012.
    Google ScholarLocate open access versionFindings
  • Swift, T. and Warren, D. S. XSB: Extending Prolog with tabled logic programming. Theory and Practice of Logic Programming, 12(1–2):157–187, 2012.
    Google ScholarLocate open access versionFindings
  • Tai, K. S., Socher, R., and Manning, C. D. Improved semantic representations from tree-structured long shortterm memory networks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2015.
    Google ScholarLocate open access versionFindings
  • Tran, K. M., Bisk, Y., Vaswani, A., Marcu, D., and Knight, K. Unsupervised neural hidden Markov models. In Proceedings of the Workshop on Structured Prediction for NLP, pp. 63–71, Austin, TX, November 2016.
    Google ScholarLocate open access versionFindings
  • Xu, H., Luo, D., and Carin, L. Online continuous-time tensor factorization based on pairwise interactive point processes. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Zhang, X., Lu, L., and Lapata, M. Top-down tree long short-term memory networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 310–320, San Diego, California, June 2016.
    Google ScholarLocate open access versionFindings
  • Trivedi, R., Dai, H., Wang, Y., and Song, L. KnowEvolve: Deep temporal reasoning for dynamic knowledge graphs. In Proceedings of the International Conference on Machine Learning (ICML), 2017.
    Google ScholarLocate open access versionFindings
  • Trivedi, R., Farajtabar, M., Biswal, P., and Zha, H. DyRep: Learning representations over dynamic graphs. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • van de Meent, J.-W., Paige, B., Yang, H., and Wood, F. An introduction to probabilistic programming. arXiv preprint arXiv:1809.10756, 2018.
    Findings
  • Van der Heijden, M., Velikova, M., and Lucas, P. J. Learning Bayesian networks for clinical time series analysis. Journal of Biomedical Informatics, 2014.
    Google ScholarLocate open access versionFindings
  • Wang, Y., Smola, A., Maddix, D. C., Gasthaus, J., Foster, D., and Januschowski, T. Deep factors for forecasting. In Proceedings of the International Conference on Machine Learning (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • Weston, J., Chopra, S., and Bordes, A. Memory networks. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • Williams, R. J. and Zipser, D. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280, 1989.
    Google ScholarLocate open access versionFindings
  • Xiao, C., Teichmann, C., and Arkoudas, K. Grammatical sequence prediction for real-time neural semantic parsing. In Proceedings of the ACL Workshop on Deep Learning and Formal Languages: Building Bridges, 2019.
    Google ScholarLocate open access versionFindings
  • Xu, D., Ruan, C., Korpeoglu, E., Kumar, S., and Achan, K. Inductive representation learning on temporal graphs. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments