AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
The available datasets were usually constrained in linguistic variability or lacking multi-domain use cases

MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling.

EMNLP, (2019): 5016-5026

Cited by: 457|Views557
EI

Abstract

Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available. To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over mu...More

Code:

Data:

0
Introduction
Highlights
  • Conversational Artificial Intelligence (Conversational AI) is one of the long-standing challenges in computer science and artificial intelligence since the Dartmouth Proposal (McCarthy et al, 1955)
  • Instead of focusing on creating ambitious conversational agents that can reach human-level intelligence, industrial practice has focused on building task-oriented dialogue systems (Young et al, 2013) that can help with specific tasks such as flight reservation (Seneff and Polifroni, 2000)
  • These difficulties have led to the same solution of using statistical framework and machine learning for various system components, such as natural language understanding (Henderson et al, 2013; Mesnil et al, 2015; Mrksicet al., 2017a), dialogue management (Gasicand Young, 2014; Tegho et al, 2018), language generation (Wen et al, 2015; Kiddon et al, 2016), and even end-to-end dialogue modelling (Zhao and Eskenazi, 2016; Wen et al, 2017; Eric et al, 2017)
  • We considered various possible dialogue scenarios ranging from requesting basic information about attractions through booking a hotel room or travelling between cities
  • The available datasets were usually constrained in linguistic variability or lacking multi-domain use cases
Results
  • Even with the perfect dialogue state tracking of the user intent, the baseline models obtain almost 30% lower score on the Inform metric on the new corpus.
  • The significantly lower metrics on the MultiWOZ corpus showed that it is much more challenging than the SFX restaurant dataset.
  • This is probably due to the fact that more than 60% of the dialogue turns are composed of at least two system acts, which greatly harms the performance of the existing model
Conclusion
  • As more and more speech oriented applications are commercially deployed, the necessity of building an entirely data-driven conversational agent becomes more apparent.
  • Various corpora were gathered to enable data-driven approaches to dialogue modelling.
  • The authors established a data-collection pipeline entirely based on crowd-sourcing enabling to gather a large scale, linguistically rich corpus of human-human conversations.
  • The scale of the data should help push forward research in the end-to-end dialogue modelling
Tables
  • Table1: Comparison of our corpus to similar data sets. Numbers in bold indicate best value for the respective metric. The numbers are provided for the training part of data except for FRAMES data-set were such division was not defined
  • Table2: Full ontology for all domains in our data-set. The upper script indicates which domains it belongs to. *: universal, 1: restaurant, 2: hotel, 3: attraction, 4: taxi, 5: train, 6: hospital, 7: police
  • Table3: The test set accuracies overall and for joint goals in the restaurant sub-domain
  • Table4: Performance comparison of two different model architectures using a corpus-based evaluation
  • Table5: The test set slot error rate (SER) and BLEU on the SFX dataset and the MultiWOZ restaurant subset
Download tables as Excel
Related work
  • Existing datasets can be roughly grouped into three categories: machine-to-machine, human-tomachine, and human-to-human conversations. A detailed review of these categories is presented below.

    http://dialogue.mi.eng.cam.ac.uk/

    index.php/corpus/

    Machine-to-Machine Creating an environment with a simulated user enables to exhaustively generate dialogue templates. These templates can be mapped to a natural language by either pre-defined rules (Bordes et al, 2017) or crowd workers (Shah et al, 2018). Such approach ensures a diversity and full coverage of all possible dialogue outcomes within a certain domain. However, the naturalness of the dialogue flows relies entirely on the engineered set-up of the user and system bots. This poses a risk of a mismatch between training data and real interactions harming the interaction quality. Moreover, these datasets do not take into account noisy conditions often experienced in real interactions (Black et al, 2011).
Funding
  • This work was funded by a Google Faculty Research Award (RG91111), an EPSRC studentship (RG80792), an EPSRC grant (EP/M018946/1) and by Toshiba Research Europe Ltd, Cambridge Research Laboratory (RG85875)
Study subjects and analysis
workers: 1249
The responses are also more diverse thus enabling the training of more complex generation models. In total, 1, 249 workers contributed to the corpus creation with only few instances of intentional wrongdoing. Additional restrictions were added to automatically discover instances of very short utterances, short dialogues or missing single turns during annotations

datasets: 2
Cam676 w/o attention w/ attention w/o attention w/ attention logue turns. To give more statistics about the two datasets: the SFX corpus has 9 different act types with 12 slots comparing to 12 acts and 14 slots in our corpus. The best model for both datasets was found through a grid search over a set of hyperparameters such as the size of embeddings, learning rate, and number of LSTM layers.6

Reference
  • Layla El Asri, Hannes Schulz, Shikhar Sharma, Jeremie Zumer, Justin Harris, Emery Fine, Rahul Mehrotra, and Kaheer Suleman. 2017. Frames: A corpus for adding memory to goal-oriented dialogue systems. Proceedings of SigDial.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. ICLR.
    Google ScholarLocate open access versionFindings
  • Alan W Black, Susanne Burger, Alistair Conkie, Helen Hastie, Simon Keizer, Oliver Lemon, Nicolas Merigaud, Gabriel Parent, Gabriel Schubiner, Blaise Thomson, et al. 2011. Spoken dialog challenge 2010: Comparison of live and control test results. In Proceedings of the SIGDIAL 2011 Conference, pages 2–7. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Dan Bohus and Alexander I Rudnicky. 2005.
    Google ScholarFindings
  • Antoine Bordes, Y-Lan Boureau, and Jason Weston. 2017. Learning end-to-end goal-oriented dialog. Proceedings of ICLR.
    Google ScholarLocate open access versionFindings
  • Paweł Budzianowski, Inigo Casanueva, Bo-Hsiang Tseng, and Milica Gasic. 2018. Towards endto-end multi-domain dialogue modelling. Tech. Rep. CUED/F-INFENG/TR.706, University of Cambridge, Engineering Department.
    Google ScholarFindings
  • Harry Bunt. 2006. Dimensions in dialogue act annotation. In Proc. of LREC, volume 6, pages 919–924.
    Google ScholarLocate open access versionFindings
  • Mihail Eric, Lakshmi Krishnan, Francois Charette, and Christopher D Manning. 2017. Key-value retrieval networks for task-oriented dialogue. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 37–49.
    Google ScholarLocate open access versionFindings
  • Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378.
    Google ScholarLocate open access versionFindings
  • Milica Gasic, Dongho Kim, Pirros Tsiakoulis, Catherine Breslin, Matthew Henderson, Martin Szummer, Blaise Thomson, and Steve Young. 2014. Incremental on-line adaptation of pomdp-based dialogue managers to extended domains. In Interspeech.
    Google ScholarFindings
  • Milica Gasicand Steve Young. 2014. Gaussian processes for pomdp-based dialogue manager optimization. TASLP, 22(1):28–40.
    Google ScholarLocate open access versionFindings
  • Charles T Hemphill, John J Godfrey, and George R Doddington. 1990. The atis spoken language systems pilot corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania.
    Google ScholarLocate open access versionFindings
  • M. Henderson, B. Thomson, and J. Williams. 2014a. The second dialog state tracking challenge. In Proceedings of SIGdial.
    Google ScholarLocate open access versionFindings
  • M. Henderson, B. Thomson, and S. J. Young. 2014b. Word-based Dialog State Tracking with Recurrent Neural Networks. In Proceedings of SIGdial.
    Google ScholarLocate open access versionFindings
  • Matthew Henderson, Blaise Thomson, and Jason D Williams. 2014c. The third dialog state tracking challenge. In Spoken Language Technology Workshop (SLT), 2014 IEEE, pages 324–329. IEEE.
    Google ScholarLocate open access versionFindings
  • Matthew Henderson, Blaise Thomson, and Steve Young. 2013. Deep neural network approach for the dialog state tracking challenge. In Proceedings of the SIGDIAL 2013 Conference, pages 467–471.
    Google ScholarLocate open access versionFindings
  • John F Kelley. 1984. An iterative design methodology for user-friendly natural language office information applications. ACM Transactions on Information Systems (TOIS), 2(1):26–41.
    Google ScholarLocate open access versionFindings
  • Chloe Kiddon, Luke Zettlemoyer, and Yejin Choi. 2016. Globally coherent text generation with neural checklist models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 329–339.
    Google ScholarLocate open access versionFindings
  • Seokhwan Kim, Luis Fernando D’Haro, Rafael E Banchs, Jason D Williams, Matthew Henderson, and Koichiro Yoshino. 2016. The fifth dialog state tracking challenge. In Spoken Language Technology Workshop (SLT), 2016 IEEE, pages 511–517. IEEE.
    Google ScholarLocate open access versionFindings
  • Seokhwan Kim, Luis Fernando DHaro, Rafael E Banchs, Jason D Williams, and Matthew Henderson. 2017. The fourth dialog state tracking challenge. In Dialogues with Social Robots, pages 435–449. Springer.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In NAACL-HLT, pages 110–119, San Diego, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, and Asli Celikyilmaz. 2017. End-to-end taskcompletion neural dialogue systems. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume 1, pages 733–743.
    Google ScholarLocate open access versionFindings
  • Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2122–2132.
    Google ScholarLocate open access versionFindings
  • Ryan Lowe, Nissan Pow, Iulian V Serban, and Joelle Pineau. 2015. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, page 285.
    Google ScholarLocate open access versionFindings
  • J. McCarthy, M. L. Minsky, N. Rochester, and C. E. Shannon. 1955. A proposal for the dartmouth summer research project on artificial intelligence.
    Google ScholarFindings
  • Gregoire Mesnil, Yann Dauphin, Kaisheng Yao, Yoshua Bengio, Li Deng, Dilek Hakkani-Tur, Xiaodong He, Larry Heck, Gokhan Tur, Dong Yu, et al. 2015. Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3):530–539.
    Google ScholarLocate open access versionFindings
  • Nikola Mrksic, Diarmuid O Seaghdha, Tsung-Hsien Wen, Blaise Thomson, and Steve Young. 2017a. Neural belief tracker: Data-driven dialogue state tracking. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1777– 1788.
    Google ScholarLocate open access versionFindings
  • Nikola Mrksic, Ivan Vulic, Diarmuid O Seaghdha, Ira Leviant, Roi Reichart, Milica Gasic, Anna Korhonen, and Steve Young. 2017b. Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints. Transactions of the Association of Computational Linguistics, 5(1):309–324.
    Google ScholarLocate open access versionFindings
  • Alice H Oh and Alexander I Rudnicky. 2000. Stochastic language generation for spoken dialogue systems. In Proceedings of the 2000 ANLP/NAACL Workshop on Conversational systems-Volume 3, pages 27–32. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tim Paek and Roberto Pieraccini. 2008. Automating spoken dialogue management design using machine learning: An industry perspective. Speech communication, 50(8-9):716–729.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ashwin Ram, Rohit Prasad, Chandra Khatri, Anu Venkatesh, Raefer Gabriel, Qing Liu, Jeff Nunn, Behnam Hedayatnia, Ming Cheng, Ashish Nagar, et al. 2018. Conversational ai: The science behind the alexa prize. arXiv preprint arXiv:1801.03604.
    Findings
  • Osman Ramadan, Paweł Budzianowski, and Milica Gasic. 2018. Large-scale multi-domain belief tracking with knowledge sharing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, volume 2, pages 432–437.
    Google ScholarLocate open access versionFindings
  • Abhinav Rastogi, Dilek Hakkani-Tur, and Larry Heck. 2017. Scalable multi-domain dialogue state tracking. arXiv preprint arXiv:1712.10224.
    Findings
  • Antoine Raux, Brian Langner, Dan Bohus, Alan W Black, and Maxine Eskenazi. 2005. Let’s go public! taking a spoken dialog system to the real world. In Ninth European Conference on Speech Communication and Technology.
    Google ScholarLocate open access versionFindings
  • Alan Ritter, Colin Cherry, and Bill Dolan. 2010. Unsupervised modeling of twitter conversations. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 172–180.
    Google ScholarLocate open access versionFindings
  • Nicolas Schrading, Cecilia Ovesdotter Alm, Ray Ptucha, and Christopher Homan. 2015. An analysis of domestic abuse discourse on reddit. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2577–2583.
    Google ScholarLocate open access versionFindings
  • Stephanie Seneff and Joseph Polifroni. 2000. Dialogue management in the mercury flight reservation system. In Proceedings of the 2000 ANLP/NAACL Workshop on Conversational Systems - Volume 3, ANLP/NAACL-ConvSyst ’00, pages 11–16, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • P Shah, D Hakkani-Tur, G Tur, A Rastogi, A Bapna, N Nayak, and L Heck. 2018. Building a conversational agent overnight with dialogue self-play. arXiv preprint arXiv:1801.04871.
    Findings
  • Amanda Stent, Matthew Marge, and Mohit Singhai. 2005. Evaluating evaluation methods for generation in the presence of variation. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 341–351. Springer.
    Google ScholarLocate open access versionFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112.
    Google ScholarLocate open access versionFindings
  • Christopher Tegho, PawełBudzianowski, and Milica Gasic. 2018. Benchmarking uncertainty estimates with deep reinforcement learning for dialogue policy optimisation. In IEEE ICASSP.
    Google ScholarFindings
  • David R. Traum. 1999. Foundations of Rational Agency, chapter Speech Acts for Dialogue Agents. Springer.
    Google ScholarFindings
  • David R Traum and Elizabeth A Hinkelman. 1992. Conversation acts in task-oriented spoken dialogue. Computational intelligence, 8(3):575–599.
    Google ScholarLocate open access versionFindings
  • David R Traum and Staffan Larsson. 2003. The information state approach to dialogue management. In Current and new directions in discourse and dialogue, pages 325–353. Springer.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869.
    Findings
  • Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M Rojas-Barahona, Pei-Hao Su, David Vandyke, and Steve Young. 2016. Multi-domain neural network language generation for spoken dialogue systems. ACL.
    Google ScholarLocate open access versionFindings
  • Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, PeiHao Su, David Vandyke, and Steve Young. 2015. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).
    Google ScholarLocate open access versionFindings
  • Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milica Gasic, Lina M Rojas-Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young. 2017. A networkbased end-to-end trainable task-oriented dialogue system. EACL.
    Google ScholarLocate open access versionFindings
  • Jason Williams, Antoine Raux, Deepak Ramachandran, and Alan Black. 2013. The dialog state tracking challenge. In Proceedings of the SIGDIAL 2013 Conference, pages 404–413.
    Google ScholarLocate open access versionFindings
  • Steve Young, Milica Gasic, Blaise Thomson, and Jason Williams. 2013. POMDP-based Statistical Spoken Dialogue Systems: a Review. In Proc of IEEE, volume 99, pages 1–20.
    Google ScholarLocate open access versionFindings
Author
Pawel Budzianowski
Pawel Budzianowski
Bo-Hsiang Tseng
Bo-Hsiang Tseng
Osman Ramadan
Osman Ramadan
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科