AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
This paper has introduced Hybrid Code Networks for end-to-end learning of task-oriented dialog systems

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning.

ACL, (2017)

Cited by: 288|Views381
EI
Full Text
Bibtex
Weibo

Abstract

End-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors. We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action...More

Code:

Data:

0
Introduction
  • Task-oriented dialog systems help a user to accomplish some goal using natural language, such as making a restaurant reservation, getting technical support, or placing a phonecall.
  • These dialog systems have been built as a pipeline, with modules for language understanding, state tracking, action selection, and language generation.
  • In some practical settings, programmed constraints are essential – for example, a banking dialog system would require that a user is logged in before they can retrieve account information
Highlights
  • Task-oriented dialog systems help a user to accomplish some goal using natural language, such as making a restaurant reservation, getting technical support, or placing a phonecall
  • This paper presents a model for end-to-end learning, called Hybrid Code Networks (HCNs) which addresses these problems
  • This shows that supervised learning dialogs can be introduced as reinforcement learning is in progress – i.e., that it is possible to interleave reinforcement learning and supervised learning
  • This is an attractive property for practical systems: if a dialog error is spotted by a developer while reinforcement learning is in progress, it is natural to add a training dialog to the training set
  • This paper has introduced Hybrid Code Networks for end-to-end learning of task-oriented dialog systems
  • Hybrid Code Networks support a separation of concerns where procedural knowledge and constraints can be expressed in software, and the control flow is learned
Results
  • Results are shown in Figure 4
  • This shows that SL dialogs can be introduced as RL is in progress – i.e., that it is possible to interleave RL and SL.
  • This is an attractive property for practical systems: if a dialog error is spotted by a developer while RL is in progress, it is natural to add a training dialog to the training set
Conclusion
  • This paper has introduced Hybrid Code Networks for end-to-end learning of task-oriented dialog systems.
  • HCNs support a separation of concerns where procedural knowledge and constraints can be expressed in software, and the control flow is learned.
  • Compared to existing end-to-end approaches, HCNs afford more developer control and require less training data, at the expense of a small amount of developer effort
Tables
  • Table1: Results on bAbI dialog Task5-OOV and Task6 (<a class="ref-link" id="cBordes_2016_a" href="#rBordes_2016_a">Bordes and Weston, 2016</a>). Results for “Rules” taken from <a class="ref-link" id="cBordes_2016_a" href="#rBordes_2016_a">Bordes and Weston (2016</a>). Note that, unlike cited past work, HCNs make use of domainspecific procedural knowledge
  • Table2: Basic statistics of labeled customer support dialogs. Test accuracy refers to whole-dialog accuracy of the existing rule-based system
  • Table3: Dimensions of the 5 HCNs in this paper
  • Table4: Binary context features used to convey entity and database state in Section 4
Download tables as Excel
Related work
  • Broadly there are two lines of work applying machine learning to dialog control. The first decomposes a dialog system into a pipeline, typically including language understanding, dialog state tracking, action selection policy, and language generation (Levin et al, 2000; Singh et al, 2002; Williams and Young, 2007; Williams, 2008; Hori et al, 2009; Lee et al, 2009; Griol et al, 2008; Young et al, 2013; Li et al, 2014). Specifically related to HCNs, past work has implemented the policy as feed-forward neural networks (Wen et al, 2016), trained with supervised learning followed by reinforcement learning (Su et al, 2016). In these works, the policy has not been recurrent – i.e., the policy depends on the state tracker to summarize observable dialog history into state features, which requires design and specialized labeling. By contrast, HCNs use an RNN which automatically infers a representation of state. For learning efficiency, HCNs use an external lightweight process for tracking entity values, but the policy is not strictly dependent on it: as an illustration, in Section 5 below, we demonstrate an HCNbased dialog system which has no external state tracker. If there is context which is not apparent in the text in the dialog, such as database status, this can be encoded as a context feature to the RNN.
Funding
  • Introduces Hybrid Code Networks , which combine an RNN with domain-specific knowledge encoded as software and system action templates
  • Presents a model for end-to-end learning, called Hybrid Code Networks which addresses these problems
  • Demonstrates an HCNbased dialog system which has no external state tracker
Reference
  • Antoine Bordes and Jason Weston. 2016. Learning end-to-end goal-oriented dialog. CoRR abs/1605.07683. http://arxiv.org/abs/1605.07683.
    Findings
  • Franois Chollet. 2015. Keras. https://github.com/fchollet/keras.
    Findings
  • Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Proc NIPS 2014 Deep Learning and Representation Learning Workshop.
    Google ScholarLocate open access versionFindings
  • Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, Yun-Nung Chen, Faisal Ahmed, and Li Deng. 2017.
    Google ScholarFindings
  • Towards end-to-end reinforcement learning of dialogue agents for information access. In Proc Association for Computational Linguistics, Vancouver, Canada.
    Google ScholarLocate open access versionFindings
  • Mihail Eric and Christopher D Manning. 2017.
    Google ScholarFindings
  • https://arxiv.org/abs/1701.04024.
    Findings
  • David Griol, Llus F. Hurtado, Encarna Segarra, and Emilio Sanchis. 200A statistical approach to spoken dialog systems design and evaluation. Speech Communication 50(8–9).
    Google ScholarLocate open access versionFindings
  • Matthew Henderson, Blaise Thomson, and Jason Williams. 2014a. The second dialog state tracking challenge. In Proc SIGdial Workshop on Discourse and Dialogue, Philadelphia, USA.
    Google ScholarLocate open access versionFindings
  • Matthew Henderson, Blaise Thomson, and Steve Young. 2014b. Word-based Dialog State Tracking with Recurrent Neural Networks. In Proc SIGdial Workshop on Discourse and Dialogue, Philadelphia, USA.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, Hideki Kashioka, and Satoshi Nakamura. 2009. Statistical dialog management applied to WFSTbased dialog systems. In Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on. pages 4793–4796. https://doi.org/10.1109/ICASSP.2009.4960703.
    Locate open access versionFindings
  • Filip Jurcıcek, Blaise Thomson, and Steve Young. 2011. Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as pomdps. ACM Transactions on Speech and Language Processing (TSLP) 7(3):6.
    Google ScholarLocate open access versionFindings
  • Nate Kohl and Peter Stone. 2004. Policy gradient reinforcement learning for fast quadrupedal locomotion. In Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 IEEE International Conference on. IEEE, volume 3, pages 2619–2624.
    Google ScholarLocate open access versionFindings
  • Cheongjae Lee, Sangkeun Jung, Seokhwan Kim, and Gary Geunbae Lee. 2009. Example-based dialog modeling for practical multi-domain dialog system. Speech Communication 51(5):466–484.
    Google ScholarLocate open access versionFindings
  • Esther Levin, Roberto Pieraccini, and Wieland Eckert. 2000. A stochastic model of human-machine interaction for learning dialogue strategies. IEEE Trans on Speech and Audio Processing 8(1):11–23.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A diversity-promoting objective function for neural conversation models. In Proc HLT-NAACL, San Diego, California, USA.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 2016b. A persona-based neural conversation model. In Proc Association for Computational Linguistics, Berlin, Germany.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. 2016c. Deep reinforcement learning for dialogue generation. In Proc Conference on Empirical Methods in Natural Language Processing, Austin, Texas, USA.
    Google ScholarLocate open access versionFindings
  • Lihong Li, He He, and Jason D. Williams. 2014. Temporal supervised learning for inferring a dialog policy from example conversations. In Proc IEEE Workshop on Spoken Language Technologies (SLT), South Lake Tahoe, Nevada, USA.
    Google ScholarLocate open access versionFindings
  • Fei Liu and Julien Perez. 2016. Gated end-toend memory networks. CoRR abs/1610.04211. http://arxiv.org/abs/1610.04211.
    Findings
  • Ryan Thomas Lowe, Nissan Pow, Iulian Vlad Serban, Laurent Charlin, Chia-Wei Liu, and Joelle Pineau. 2017. Training end-to-end dialogue systems with the ubuntu dialogue corpus. Dialogue and Discourse 8(1).
    Google ScholarLocate open access versionFindings
  • Yi Luan, Yangfeng Ji, and Mari Ostendorf. 2016.
    Google ScholarFindings
  • abs/1603.09457. http://arxiv.org/abs/1603.09457.
    Findings
  • Hongyuan Mei, Mohit Bansal, and Matthew R. Walter. 2016. Coherent dialogue with attentionbased language models. CoRR abs/1611.06997. http://arxiv.org/abs/1611.06997.
    Findings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proc Advances in Neural Information Processing Systems, Lake Tahoe, USA. pages 3111– 3119.
    Google ScholarLocate open access versionFindings
  • Min Joon Seo, Hannaneh Hajishirzi, and Ali Farhadi. 2016. Query-regression networks for machine comprehension. CoRR abs/1606.04582. http://arxiv.org/abs/1606.04582.
    Findings
  • Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, AAAI’16, pages 3776–3783. http://dl.acm.org/citation.cfm?id=3016387.3016435.
    Locate open access versionFindings
  • Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues.
    Google ScholarFindings
  • Lifeng Shang, Zhengdong Lu,, and Hang Li. 2015. Neural responding machine for short-text conversation. In Proc Association for Computational Linguistics, Beijing, China.
    Google ScholarLocate open access versionFindings
  • David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489.
    Google ScholarLocate open access versionFindings
  • Satinder Singh, Diane J Litman, Michael Kearns, and Marilyn A Walker. 2002. Optimizing dialogue management with reinforcement leaning: experiments with the NJFun system. Journal of Artificial Intelligence 16:105–133.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Chris Manning, Andrew Ng, and Chris Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proc Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA.
    Google ScholarLocate open access versionFindings
  • Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Meg Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan. 2015. A neural network approach to context-sensitive generation of conversational responses. In Proc HLT-NAACL, Denver, Colorado, USA.
    Google ScholarLocate open access versionFindings
  • Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina RojasBarahona, Stefan Ultes, David Vandyke, TsungHsien Wen, and Steve Young. 2016. Continuously learning neural dialogue management. In arXiv preprint: 1606.02689.
    Findings
  • Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob Fergus. 2015. End-to-end memory networks. In Proc Advances in Neural Information Processing Systems (NIPS), Montreal, Canada.
    Google ScholarLocate open access versionFindings
  • Theano Development Team. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints abs/1605.02688. http://arxiv.org/abs/1605.02688.
    Findings
  • Oriol Vinyals and Quoc Le. 2015. A neural conversational model. In Proc ICML Deep Learning Workshop.
    Google ScholarLocate open access versionFindings
  • Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina Maria Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, and Steve J. Young. 2016. A network-based end-to-end trainable taskoriented dialogue system. CoRR abs/1604.04562. http://arxiv.org/abs/1604.04562.
    Findings
  • Jason D. Williams. 2008. The best of both worlds: Unifying conventional dialog systems and POMDPs. In Proc Intl Conf on Spoken Language Processing (ICSLP), Brisbane, Australia.
    Google ScholarLocate open access versionFindings
  • Jason D. Williams and Steve Young. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21(2):393–422.
    Google ScholarLocate open access versionFindings
  • Ronald J Williams. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229–256.
    Google ScholarLocate open access versionFindings
  • Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, and Xiaolong Wang. 2016. Incorporating loosestructured knowledge into LSTM with recall gate for conversation modeling. CoRR abs/1605.05110. http://arxiv.org/abs/1605.05110.
    Findings
  • Kaisheng Yao, Geoffrey Zweig, and Baolin Peng. 2015. Attention with intention for a neural network conversation model. In Proc NIPS workshop on Machine Learning for Spoken Language Understanding and Interaction.
    Google ScholarLocate open access versionFindings
  • Steve Young, Milica Gasic, Blaise Thomson, and Jason D. Williams. 2013. POMDP-based Statistical Spoken Dialogue Systems: a Review. Proceedings of the IEEE PP(99):1–20.
    Google ScholarLocate open access versionFindings
  • Matthew D. Zeiler. 2012. ADADELTA: an adaptive learning rate method. CoRR abs/1212.5701. http://arxiv.org/abs/1212.5701.
    Findings
  • The RNN was specified using Keras version 0.3.3, with back-end computation in Theano version 0.8.0.dev0 (Theano Development Team, 2016; Chollet, 2015). The Keras model specification is given below. The input variable obs includes all features from Figure 1 step 6 except for the previous action (step 18) and the action mask (step 6, top-most vector).
    Google ScholarLocate open access versionFindings
Author
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科