Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning
ACL, 2017.
EI
Weibo:
Abstract:
End-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors. We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action...More
Code:
Data:
Introduction
- Task-oriented dialog systems help a user to accomplish some goal using natural language, such as making a restaurant reservation, getting technical support, or placing a phonecall.
- These dialog systems have been built as a pipeline, with modules for language understanding, state tracking, action selection, and language generation.
- In some practical settings, programmed constraints are essential – for example, a banking dialog system would require that a user is logged in before they can retrieve account information
Highlights
- Task-oriented dialog systems help a user to accomplish some goal using natural language, such as making a restaurant reservation, getting technical support, or placing a phonecall
- This paper presents a model for end-to-end learning, called Hybrid Code Networks (HCNs) which addresses these problems
- This shows that supervised learning dialogs can be introduced as reinforcement learning is in progress – i.e., that it is possible to interleave reinforcement learning and supervised learning
- This is an attractive property for practical systems: if a dialog error is spotted by a developer while reinforcement learning is in progress, it is natural to add a training dialog to the training set
- This paper has introduced Hybrid Code Networks for end-to-end learning of task-oriented dialog systems
- Hybrid Code Networks support a separation of concerns where procedural knowledge and constraints can be expressed in software, and the control flow is learned
Results
- Results are shown in Figure 4
- This shows that SL dialogs can be introduced as RL is in progress – i.e., that it is possible to interleave RL and SL.
- This is an attractive property for practical systems: if a dialog error is spotted by a developer while RL is in progress, it is natural to add a training dialog to the training set
Conclusion
- This paper has introduced Hybrid Code Networks for end-to-end learning of task-oriented dialog systems.
- HCNs support a separation of concerns where procedural knowledge and constraints can be expressed in software, and the control flow is learned.
- Compared to existing end-to-end approaches, HCNs afford more developer control and require less training data, at the expense of a small amount of developer effort
Summary
Introduction:
Task-oriented dialog systems help a user to accomplish some goal using natural language, such as making a restaurant reservation, getting technical support, or placing a phonecall.- These dialog systems have been built as a pipeline, with modules for language understanding, state tracking, action selection, and language generation.
- In some practical settings, programmed constraints are essential – for example, a banking dialog system would require that a user is logged in before they can retrieve account information
Results:
Results are shown in Figure 4- This shows that SL dialogs can be introduced as RL is in progress – i.e., that it is possible to interleave RL and SL.
- This is an attractive property for practical systems: if a dialog error is spotted by a developer while RL is in progress, it is natural to add a training dialog to the training set
Conclusion:
This paper has introduced Hybrid Code Networks for end-to-end learning of task-oriented dialog systems.- HCNs support a separation of concerns where procedural knowledge and constraints can be expressed in software, and the control flow is learned.
- Compared to existing end-to-end approaches, HCNs afford more developer control and require less training data, at the expense of a small amount of developer effort
Tables
- Table1: Results on bAbI dialog Task5-OOV and Task6 (<a class="ref-link" id="cBordes_2016_a" href="#rBordes_2016_a">Bordes and Weston, 2016</a>). Results for “Rules” taken from <a class="ref-link" id="cBordes_2016_a" href="#rBordes_2016_a">Bordes and Weston (2016</a>). Note that, unlike cited past work, HCNs make use of domainspecific procedural knowledge
- Table2: Basic statistics of labeled customer support dialogs. Test accuracy refers to whole-dialog accuracy of the existing rule-based system
- Table3: Dimensions of the 5 HCNs in this paper
- Table4: Binary context features used to convey entity and database state in Section 4
Related work
- Broadly there are two lines of work applying machine learning to dialog control. The first decomposes a dialog system into a pipeline, typically including language understanding, dialog state tracking, action selection policy, and language generation (Levin et al, 2000; Singh et al, 2002; Williams and Young, 2007; Williams, 2008; Hori et al, 2009; Lee et al, 2009; Griol et al, 2008; Young et al, 2013; Li et al, 2014). Specifically related to HCNs, past work has implemented the policy as feed-forward neural networks (Wen et al, 2016), trained with supervised learning followed by reinforcement learning (Su et al, 2016). In these works, the policy has not been recurrent – i.e., the policy depends on the state tracker to summarize observable dialog history into state features, which requires design and specialized labeling. By contrast, HCNs use an RNN which automatically infers a representation of state. For learning efficiency, HCNs use an external lightweight process for tracking entity values, but the policy is not strictly dependent on it: as an illustration, in Section 5 below, we demonstrate an HCNbased dialog system which has no external state tracker. If there is context which is not apparent in the text in the dialog, such as database status, this can be encoded as a context feature to the RNN.
Funding
- Introduces Hybrid Code Networks , which combine an RNN with domain-specific knowledge encoded as software and system action templates
- Presents a model for end-to-end learning, called Hybrid Code Networks which addresses these problems
- Demonstrates an HCNbased dialog system which has no external state tracker
Reference
- Antoine Bordes and Jason Weston. 2016. Learning end-to-end goal-oriented dialog. CoRR abs/1605.07683. http://arxiv.org/abs/1605.07683.
- Franois Chollet. 2015. Keras. https://github.com/fchollet/keras.
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Proc NIPS 2014 Deep Learning and Representation Learning Workshop.
- Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, Yun-Nung Chen, Faisal Ahmed, and Li Deng. 2017.
- Towards end-to-end reinforcement learning of dialogue agents for information access. In Proc Association for Computational Linguistics, Vancouver, Canada.
- Mihail Eric and Christopher D Manning. 2017.
- https://arxiv.org/abs/1701.04024.
- David Griol, Llus F. Hurtado, Encarna Segarra, and Emilio Sanchis. 200A statistical approach to spoken dialog systems design and evaluation. Speech Communication 50(8–9).
- Matthew Henderson, Blaise Thomson, and Jason Williams. 2014a. The second dialog state tracking challenge. In Proc SIGdial Workshop on Discourse and Dialogue, Philadelphia, USA.
- Matthew Henderson, Blaise Thomson, and Steve Young. 2014b. Word-based Dialog State Tracking with Recurrent Neural Networks. In Proc SIGdial Workshop on Discourse and Dialogue, Philadelphia, USA.
- Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9(8):1735–1780.
- Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, Hideki Kashioka, and Satoshi Nakamura. 2009. Statistical dialog management applied to WFSTbased dialog systems. In Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on. pages 4793–4796. https://doi.org/10.1109/ICASSP.2009.4960703.
- Filip Jurcıcek, Blaise Thomson, and Steve Young. 2011. Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as pomdps. ACM Transactions on Speech and Language Processing (TSLP) 7(3):6.
- Nate Kohl and Peter Stone. 2004. Policy gradient reinforcement learning for fast quadrupedal locomotion. In Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 IEEE International Conference on. IEEE, volume 3, pages 2619–2624.
- Cheongjae Lee, Sangkeun Jung, Seokhwan Kim, and Gary Geunbae Lee. 2009. Example-based dialog modeling for practical multi-domain dialog system. Speech Communication 51(5):466–484.
- Esther Levin, Roberto Pieraccini, and Wieland Eckert. 2000. A stochastic model of human-machine interaction for learning dialogue strategies. IEEE Trans on Speech and Audio Processing 8(1):11–23.
- Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A diversity-promoting objective function for neural conversation models. In Proc HLT-NAACL, San Diego, California, USA.
- Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 2016b. A persona-based neural conversation model. In Proc Association for Computational Linguistics, Berlin, Germany.
- Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. 2016c. Deep reinforcement learning for dialogue generation. In Proc Conference on Empirical Methods in Natural Language Processing, Austin, Texas, USA.
- Lihong Li, He He, and Jason D. Williams. 2014. Temporal supervised learning for inferring a dialog policy from example conversations. In Proc IEEE Workshop on Spoken Language Technologies (SLT), South Lake Tahoe, Nevada, USA.
- Fei Liu and Julien Perez. 2016. Gated end-toend memory networks. CoRR abs/1610.04211. http://arxiv.org/abs/1610.04211.
- Ryan Thomas Lowe, Nissan Pow, Iulian Vlad Serban, Laurent Charlin, Chia-Wei Liu, and Joelle Pineau. 2017. Training end-to-end dialogue systems with the ubuntu dialogue corpus. Dialogue and Discourse 8(1).
- Yi Luan, Yangfeng Ji, and Mari Ostendorf. 2016.
- abs/1603.09457. http://arxiv.org/abs/1603.09457.
- Hongyuan Mei, Mohit Bansal, and Matthew R. Walter. 2016. Coherent dialogue with attentionbased language models. CoRR abs/1611.06997. http://arxiv.org/abs/1611.06997.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proc Advances in Neural Information Processing Systems, Lake Tahoe, USA. pages 3111– 3119.
- Min Joon Seo, Hannaneh Hajishirzi, and Ali Farhadi. 2016. Query-regression networks for machine comprehension. CoRR abs/1606.04582. http://arxiv.org/abs/1606.04582.
- Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, AAAI’16, pages 3776–3783. http://dl.acm.org/citation.cfm?id=3016387.3016435.
- Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues.
- Lifeng Shang, Zhengdong Lu,, and Hang Li. 2015. Neural responding machine for short-text conversation. In Proc Association for Computational Linguistics, Beijing, China.
- David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489.
- Satinder Singh, Diane J Litman, Michael Kearns, and Marilyn A Walker. 2002. Optimizing dialogue management with reinforcement leaning: experiments with the NJFun system. Journal of Artificial Intelligence 16:105–133.
- Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Chris Manning, Andrew Ng, and Chris Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proc Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA.
- Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Meg Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan. 2015. A neural network approach to context-sensitive generation of conversational responses. In Proc HLT-NAACL, Denver, Colorado, USA.
- Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina RojasBarahona, Stefan Ultes, David Vandyke, TsungHsien Wen, and Steve Young. 2016. Continuously learning neural dialogue management. In arXiv preprint: 1606.02689.
- Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob Fergus. 2015. End-to-end memory networks. In Proc Advances in Neural Information Processing Systems (NIPS), Montreal, Canada.
- Theano Development Team. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints abs/1605.02688. http://arxiv.org/abs/1605.02688.
- Oriol Vinyals and Quoc Le. 2015. A neural conversational model. In Proc ICML Deep Learning Workshop.
- Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina Maria Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, and Steve J. Young. 2016. A network-based end-to-end trainable taskoriented dialogue system. CoRR abs/1604.04562. http://arxiv.org/abs/1604.04562.
- Jason D. Williams. 2008. The best of both worlds: Unifying conventional dialog systems and POMDPs. In Proc Intl Conf on Spoken Language Processing (ICSLP), Brisbane, Australia.
- Jason D. Williams and Steve Young. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21(2):393–422.
- Ronald J Williams. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229–256.
- Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, and Xiaolong Wang. 2016. Incorporating loosestructured knowledge into LSTM with recall gate for conversation modeling. CoRR abs/1605.05110. http://arxiv.org/abs/1605.05110.
- Kaisheng Yao, Geoffrey Zweig, and Baolin Peng. 2015. Attention with intention for a neural network conversation model. In Proc NIPS workshop on Machine Learning for Spoken Language Understanding and Interaction.
- Steve Young, Milica Gasic, Blaise Thomson, and Jason D. Williams. 2013. POMDP-based Statistical Spoken Dialogue Systems: a Review. Proceedings of the IEEE PP(99):1–20.
- Matthew D. Zeiler. 2012. ADADELTA: an adaptive learning rate method. CoRR abs/1212.5701. http://arxiv.org/abs/1212.5701.
- The RNN was specified using Keras version 0.3.3, with back-end computation in Theano version 0.8.0.dev0 (Theano Development Team, 2016; Chollet, 2015). The Keras model specification is given below. The input variable obs includes all features from Figure 1 step 6 except for the previous action (step 18) and the action mask (step 6, top-most vector).
Full Text
Tags
Comments