AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
Key-Value Retrieval Networks for Task-Oriented Dialogue
annual meeting of the special interest group on discourse and dialogue, pp.37-49, (2017)
- With the success of new speech-based humancomputer interfaces, there is a great need for effective task-oriented dialogue agents that can handle everyday tasks such as scheduling events and booking hotels.
- Current commercial dialogue agents are often brittle pattern-matching systems which are unable to maintain the kind of flexible conversations that people desire.
- Neural dialogue agents present one of the most promising avenues for leveraging dialogue corpora to build statistical models directly from data by using powerful distributed representations (Bordes and Weston, 2016; Wen et al, 2016b; Dhingra et al, 2016).
- Event opt.
- Dinner opt.
- With the success of new speech-based humancomputer interfaces, there is a great need for effective task-oriented dialogue agents that can handle everyday tasks such as scheduling events and booking hotels
- The main contributions of our work are two-fold: 1) We introduce the Key-Value Retrieval Network, a highly performant neural taskoriented dialogue agent that is able to smoothly incorporate information from underlying knowledge bases through a novel key-value retrieval mechanism
- Though prior work has shown that automatic evaluation metrics often correlate poorly with human assessments of dialogue agents (Liu et al, 2016), we report a number of automatic metrics in Table 3
- Our rule-based model has the lowest BLEU score, which is a consequence of the fact that the naturalness of the system output is very limited by the number of diverse and distinct response templates we manually provided. This is a common issue with heuristic dialogue agents and one that could be partially alleviated through a larger collection of lexically rich response templates
- We have presented a novel neural task-oriented dialogue model that is able to sustain grounded discourse across a variety of domains by retrieving world knowledge represented in knowledge bases
- The authors first introduce the details of the experiments and present results from both automatic and human evaluation. 5.1 Details
For the experiments, the authors divided the dialogues into train/validation/test sets using a 0.8/0.1/0.1 data split and ensured that each domain type was represented in each of the splits.
To reduce lexical variability, in a pre-processing step, the authors map the variant surface expression of entities to a canonical form using named entity recognition and linking.
- The authors' model outputs the canonical forms of the entities, and so the authors realize their surface forms by running the system output through an inverse lexicon.
- The inverse lexicon converts the entities back to their surface forms by sampling from a multinomial distribution with parameters of the distribution equal to the frequency count of a given surface form for an entity as observed in the training and validation data.
- Note that for the purposes of computing the evaluation metrics, the authors operate on the canonicalized forms, so that any non-deterministic variability in surface form realization does not affect the computed metrics
- The authors see that of the baseline models, Copy Net has the lowest aggregate entity F1 performance.
- The authors' rule-based model has the lowest BLEU score, which is a consequence of the fact that the naturalness of the system output is very limited by the number of diverse and distinct response templates the authors manually provided
- This is a common issue with heuristic dialogue agents and one that could be partially alleviated through a larger collection of lexically rich response templates.
- This is because it was designed to accurately parse the semantics of user utterances and query the underlying KB of the dialogue, through manually-provided heuristics
- Conclusion and Future
In this work, the authors have presented a novel neural task-oriented dialogue model that is able to sustain grounded discourse across a variety of domains by retrieving world knowledge represented in knowledge bases.
- Table1: Slots types and number distinct slot values for different domains. POI denotes point-of-interest
- Table2: Statistics of Dataset
- Table3: Evaluation on our test data. Bold values indicate best model performance. We provide both an aggregated F1 score as well as domain-specific F1 scores. Attn. Seq2Seq refers to a sequence-tosequence model with encoder attention. KV Retrieval Net (no enc. attn.) refers to our new model with no encoder attention context vector computed during decoding
- Table4: Human evaluation results on realtime dialogues
- Table5: Human evaluation of system outputs on test set
- Task-oriented agents for spoken dialogue systems have been the subject of extensive research effort. One line of work by (Young et al, 2013) has tackled the problem using partially observable Markov decision processes and reinforcement learning with carefully designed action spaces, though the number of distinct action states makes this approach often brittle and computationally intractable.
The recent successes of neural architectures on a number of traditional natural language processing subtasks (Bahdanau et al, 2015; Sutskever et al, 2014; Vinyals et al, 2015) have motivated investigation into dialogue agents that can effectively make use of distributed neural representations for dialogue state management, belief tracking, and response generation. Recent work by (Wen et al, 2016b) has built systems with modularly-connected representation, belief state, and generation components. These models learn to explicitly represent user intent through intermediate supervision, which breaks end-to-end trainability. Other work by (Bordes and Weston, 2016; Liu and Perez, 2016) stores dialogue context in a memory module and repeatedly queries and reasons about this context to select an adequate system response from a set of all candidate responses.
- We gratefully acknowledge the funding of the Ford Research and Innovation Center, under Grant No 124344
- L. El Asri, H. Schulz, S. Sharma, J. Zumer, J. Harris, E. Fine, R. Mehrotra, and K. Suleman. 2017. Frames: A corpus for adding memory to goal-oriented dialogue systems. http://www.maluuba.com/publications/.
- Dzimitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (ICLR2015).
- Christina Bennett and Alexander I. Rudnicky. 2002. The carnegie mellon communicator corpus. In ICSLP.
- Antoine Bordes and Jason Weston. 2016. Learning end-to-end goal-oriented dialog. arXiv preprint arXiv:1605.07683.
- 2016. End-to-end reinforcement learning of dialogue agents for information access. arXiv preprint arXiv:1609.00777.
- Mihail Eric and Christopher Manning. 2017. A copyaugmented sequence-to-sequence architecture gives good performance on task-oriented dialogue. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, pages 468–473. http://aclweb.org/anthology/E17-2075.
- Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pages 1631–1640. http://www.aclweb.org/anthology/P16-1154.
- Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bowen Zhou, and Yoshua Bengio. 2016. Pointing the unknown words. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pages 140–149. http://www.aclweb.org/anthology/P16-1014.
- C.T. Hemphill, J. J. Godfrey, and G. R. Doddington. 1990. The atis spoken language systems pilot corpus. In Proceedings of the DARPA speech and natural language workshop.
- Matthew Henderson, Blaise Thomson, and Jason Williams. 2014a. The second dialog state tracking challenge. 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue page 263.
- Matthew Henderson, Blaise Thomson, and Steve Young. 2014b. Word-based dialog state tracking with recurrent neural networks. In Proceedings of the SIGDIAL 2014 Conference.
- Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 20Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
- Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural Computation pages 1735–1780.
- Robin Jia and Percy Liang. 2016. Data recombination for neural semantic parsing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pages 12–22. http://www.aclweb.org/anthology/P16-1002.
- Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufman, Balint Miklos, Greg Corrado, Andrew Tomkins, Laszlo Lukacs, Marina Ganea, Peter Young, and Vivek Ramavajjala. 2016. Smart reply: Automated response suggestion for email. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (2016).. https://arxiv.org/pdf/1606.04870v1.pdf.
- Diederik Kingma and Jimmy Ba. 2015. Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR2015).
- Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, pages 110–119. http://www.aclweb.org/anthology/N16-1014.
- Wang Ling, Phil Blunsom, Edward Grefenstette, Karl Moritz Hermann, Tomas Kocisky, Fumin Wang, and Andrew Senior. 2016. Latent predictor networks for code generation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pages 599–609. http://www.aclweb.org/anthology/P16-1057.
- Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, pages 2122–2132. https://aclweb.org/anthology/D16-1230.
- Fei Liu and Julien Perez. 2016. Gated endto-end memory networks.
- Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015a. Effective approaches to attentionbased neural machine translation. Empirical Methods in Natural Language Processing pages 1412– 1421.
- Alexander Miller, Adam Fisch, Jesse Dodge, AmirHossein Karimi, Antoine Bordes, and Jason Weston. 2016. Key-value memory networks for directly reading documents. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, pages 1400–1409. https://aclweb.org/anthology/D16-1147.
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pages 311–318. https://doi.org/10.3115/1073083.1073135.
- Vu Pham, Theodore Bluche, Christopher Kermorvant, and Jerome Louradour. 2014. Dropout improves recurrent neural networks for handwriting recognition. arXiv preprint arXiv:1312.4569v2.
- Alan Ritter, Colin Cherry, and William B. Dolan. 2011. Data-driven response generation in social media. Empirical Methods in Natural Language Processing pages 583–593.
- Sikhar Sharma, Layla El Asri, Hannes Schulz, and Jeremie Zumer. 2017. Relevance of unsupervised metrics in task-oriented dialogue for evaluating natural language generation. arXiv preprint arXiv:1706.09799.
- David Sussillo and L.F. Abbott. 2015. Random walk initialization for training very deep feed forward networks. arXiv preprint arXiv:1412.6558.
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, Curran Associates, Inc., pages 3104– 3112. http://papers.nips.cc/paper/5346-sequenceto-sequence-learning-with-neural-networks.pdf.
- Oriol Vinyals, Ł ukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, and Geoffrey Hinton. 2015. Grammar as a foreign language. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, Curran Associates, Inc., pages 2773– 2781. http://papers.nips.cc/paper/5635-grammaras-a-foreign-language.pdf.
- Tsung-Hsien Wen, David Vandyke, Milica Gasic, Nikola Mrksic, Lina. M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young. 2016b. A network-based end-to-end trainable task-oriented dialogue system. arXiv preprint arXiv:1604.04562.
- Jason D. Williams, Kavosh Asadi, and Geoffrey Zweig. 2017. Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning. arXiv preprint arXiv:1702.03274.
- Jason D. Williams, Antoine Raux, Deepak Ramachadran, and Alan Black. 2013. The dialog state tracking challenge. In Proceedings of the SIGDIAL. Metz, France.
- Steve Young, Milica Gasic, Blaise Thomson, and Jason D. Williams. 2013. POMDP-based statistical spoken dialog systems: a review. Proceedings of the IEEE 28(1):114–133.