Non-Autoregressive Dialog State Tracking

ICLR, 2020.

Cited by: 0|Bibtex|Views41
EI
Other Links: arxiv.org|dblp.uni-trier.de
Weibo:
We propose a novel framework of Non-Autoregressive Dialog State Tracking which can factor in potential dependencies among domains and slots to optimize the models towards better prediction of dialogue states as a complete set rather than separate slots

Abstract:

Recent efforts in Dialogue State Tracking (DST) for task-oriented dialogues have progressed toward open-vocabulary or generation-based approaches where the models can generate slot value candidates from the dialogue history itself. These approaches have shown good performance gain, especially in complicated dialogue domains with dynamic...More

Code:

Data:

0
Introduction
  • In task-oriented dialogues, a dialogue agent is required to assist humans for one or many tasks such as finding a restaurant and booking a hotel.
  • Existing DST models can be categorized into two types: fixed- and open-vocabulary.
  • Fixed vocabulary models assume known slot ontology and generate a score for each candidate of (Ramadan et al, 2018; Lee et al, 2019).
  • Recent approaches propose open-vocabulary models that can generate the candidates, especially for slots such as entity names and time, from the dialogue history (Lei et al, 2018; Wu et al, 2019)
Highlights
  • In task-oriented dialogues, a dialogue agent is required to assist humans for one or many tasks such as finding a restaurant and booking a hotel
  • A crucial part of a task-oriented dialogue system is Dialogue State Tracking (DST), which aims to identify user goals expressed during a conversation in the form of dialogue states
  • Our contributions in this work include: (1) we propose a novel framework of Non-Autoregressive Dialog State Tracking (NADST), which explicitly learns inter-dependencies across slots for decoding dialogue states as a complete set rather than individual slots; (2) we propose a non-autoregressive decoding scheme, which enjoys low latency for real-time dialogues, and allows to capture dependencies at token level in addition to slot level; (3) we achieve the state-of-the-art performance on the multi-domain task-oriented dialogue dataset “MultiWOZ 2.1” (Budzianowski et al, 2018; Eric et al, 2019) while significantly reducing the inference latency by an order of magnitude; (4) we conduct extensive ablation studies in which our analysis reveals that our models can detect potential signals across slots and dialogue domains to generate more correct “sets” of slots for Dialogue State Tracking
  • As can be seen in Table 2, our models are designed for non-autoregressive decoding, they can outperform state-of-the-art Dialogue State Tracking approaches that utilize autoregressive decoding such as (Wu et al, 2019)
  • We proposed Non-Autoregressive Dialog State Tracking, a novel Non-Autoregressive neural architecture for Dialogue State Tracking that allows the model to explicitly learn dependencies at both slot-level and token-level to improve the joint accuracy rather than just individual slot accuracy
  • Our extensive experiments on the well-known MultiWOZ corpus for large-scale multi-domain dialogue systems benchmark show that our Non-Autoregressive Dialog State Tracking model achieved the state-of-the-art accuracy results for Dialogue State Tracking tasks, while enjoying a substantially low inference latency which is an order of magnitude lower than the prior work
Methods
  • 4.1 DATASET

    MultiWOZ (Budzianowski et al, 2018) is one of the largest publicly available multi-domain taskoriented dialogue dataset with dialogue domains extended over 7 domains.
  • The authors use the new version of the MultiWOZ dataset published by Eric et al (2019).
  • Each dialogue has more than one domain.
  • The authors pre-processed the dialogues by tokenizing, lower-casing, and delexicalizing all system responses following the pre-processing scripts from (Wu et al, 2019).
  • The authors identify a total of 35 pairs.
  • Other details of data pre-processing procedures, corpus statistics, and list of pairs are described in Appendix A.1
Results
  • The authors evaluate model performance by the joint goal accuracy as commonly used in DST (Henderson et al, 2014b).
  • Following prior DST work, the authors reported the model performance on the restaurant domain in MultiWOZ 2.0 in Table 4.
  • In this dialogue domain, the model surpasses other DST models in both Joint Accuracy and Slot Accuracy.
  • In Figure 3, for a fair comparison between TRADE and NADST, the authors plot the latency of the original TRADE which decodes dialogue state slot by slot and a new version of TRADE∗ model which decodes individual slots following a parallel decoding mechanism.
  • In the ideal case with access to ground-truth labels of both Xdel and Xds×fert, the model can obtain a joint accuracy of 73%
Conclusion
  • The authors proposed NADST, a novel Non-Autoregressive neural architecture for DST that allows the model to explicitly learn dependencies at both slot-level and token-level to improve the joint accuracy rather than just individual slot accuracy.
  • The authors' extensive experiments on the well-known MultiWOZ corpus for large-scale multi-domain dialogue systems benchmark show that the NADST model achieved the state-of-the-art accuracy results for DST tasks, while enjoying a substantially low inference latency which is an order of magnitude lower than the prior work
Summary
  • Introduction:

    In task-oriented dialogues, a dialogue agent is required to assist humans for one or many tasks such as finding a restaurant and booking a hotel.
  • Existing DST models can be categorized into two types: fixed- and open-vocabulary.
  • Fixed vocabulary models assume known slot ontology and generate a score for each candidate of (Ramadan et al, 2018; Lee et al, 2019).
  • Recent approaches propose open-vocabulary models that can generate the candidates, especially for slots such as entity names and time, from the dialogue history (Lei et al, 2018; Wu et al, 2019)
  • Methods:

    4.1 DATASET

    MultiWOZ (Budzianowski et al, 2018) is one of the largest publicly available multi-domain taskoriented dialogue dataset with dialogue domains extended over 7 domains.
  • The authors use the new version of the MultiWOZ dataset published by Eric et al (2019).
  • Each dialogue has more than one domain.
  • The authors pre-processed the dialogues by tokenizing, lower-casing, and delexicalizing all system responses following the pre-processing scripts from (Wu et al, 2019).
  • The authors identify a total of 35 pairs.
  • Other details of data pre-processing procedures, corpus statistics, and list of pairs are described in Appendix A.1
  • Results:

    The authors evaluate model performance by the joint goal accuracy as commonly used in DST (Henderson et al, 2014b).
  • Following prior DST work, the authors reported the model performance on the restaurant domain in MultiWOZ 2.0 in Table 4.
  • In this dialogue domain, the model surpasses other DST models in both Joint Accuracy and Slot Accuracy.
  • In Figure 3, for a fair comparison between TRADE and NADST, the authors plot the latency of the original TRADE which decodes dialogue state slot by slot and a new version of TRADE∗ model which decodes individual slots following a parallel decoding mechanism.
  • In the ideal case with access to ground-truth labels of both Xdel and Xds×fert, the model can obtain a joint accuracy of 73%
  • Conclusion:

    The authors proposed NADST, a novel Non-Autoregressive neural architecture for DST that allows the model to explicitly learn dependencies at both slot-level and token-level to improve the joint accuracy rather than just individual slot accuracy.
  • The authors' extensive experiments on the well-known MultiWOZ corpus for large-scale multi-domain dialogue systems benchmark show that the NADST model achieved the state-of-the-art accuracy results for DST tasks, while enjoying a substantially low inference latency which is an order of magnitude lower than the prior work
Tables
  • Table1: A sample task-oriented dialogue with annotated dialogue states after each user turn. The dialogue states in red and blue denote slots from the attraction domain and train domain respectively. Slot values are expressed in user and system utterances (highlighted by underlined text)
  • Table2: DST Joint Accuracy metric on MultiWOZ 2.1 and 2.0. †: results reported on MultiWOZ2.0 leaderboard. : results reported by <a class="ref-link" id="cEric_et+al_2019_a" href="#rEric_et+al_2019_a">Eric et al (2019</a>). Best results are highlighted in bold
  • Table3: DST joint accuracy and slot accuracy on MultiWOZ2.0 restaurant domain. Baseline results (except TSCP) were from <a class="ref-link" id="cWu_et+al_2019_a" href="#rWu_et+al_2019_a">Wu et al (2019</a>)
  • Table4: Latency analysis on MultiWOZ2.1. Latency is reported in terms of wall-clock time in ms per prediction state
  • Table5: Ablation analysis on MultiWOZ 2.1 on 4 components: partially delexicalized dialogue history Xdel, slot gating, positional encoding P E(Xds×fert), and pointer network
  • Table6: Performance of auto-regressive model variants on MultiWOZ2.0 and 2.1. Fertility prediction is removed as fertility becomes redudant in auto-regressive models
  • Table7: Summary of MultiWOZ dataset 2.1
  • Table8: Additional domain-specific results of our model in MultiWOZ2.0 and MultiWOZ2.1. The model performs best with the restaurant domain and worst with the taxi domain
  • Table9: Additional results of our model in MultiWOZ2.1 when we assume access to the groundtruth labels of Xdel and Xds×fert (oracle prediction). We vary the the percentage of using the model prediction Xdel and Xds×fert from 100% (true prediction) to 0% (oracle prediction)
  • Table10: Full set of predicted dialogue states for dialogue ID MUL0536 in MultiWOZ2.1
  • Table11: Full set of predicted dialogue states for dialogue ID PMUL3759 in MultiWOZ2.1
Download tables as Excel
Related work
  • Our work is related to two research areas: dialogue state tracking and non-autoregressive decoding.

    2.1 DIALOGUE STATE TRACKING

    Dialogue State Tracking (DST) is an important component in task-oriented dialogues, especially for dialogues with complex domains that require fine-grained tracking of relevant slots. Traditionally, DST is coupled with Natural Language Understanding (NLU). NLU output as tagged user utterances is input to DST models to update the dialogue states turn by turn (Kurata et al, 2016; Shi et al, 2016; Rastogi et al, 2017). Recent approaches combine NLU and DST to reduce the credit assignment problem and remove the need for NLU (Mrksicet al., 2017; Xu & Hu, 2018; Zhong et al, 2018). Within this body of research, Goel et al (2019) differentiates two DST approaches: fixed- and openvocabulary. Fixed-vocabulary approaches are usually retrieval-based methods in which all candidate pairs of (slot, value) from a given slot ontology are considered and the models predict a probability score for each pair (Henderson et al, 2014c; Ramadan et al, 2018; Lee et al, 2019). Recent work has moved towards open-vocabulary approaches that can generate the candidates based on input text i.e. dialogue history (Lei et al, 2018; Gao et al, 2019; Wu et al, 2019). Our work is more related to these models, but different from most of the current work, we explicitly consider dependencies among slots and domains to decode dialogue state as a complete set.
Reference
  • Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
    Findings
  • Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Inigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. MultiWOZ - a large-scale multi-domain wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 5016–5026, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1547. URL https://www.aclweb.org/anthology/D18-1547.
    Locate open access versionFindings
  • Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz Kaiser. Universal transformers. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=HyzdRiR9Y7.
    Locate open access versionFindings
  • Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyag Gao, and Dilek Hakkani-Tur. Multiwoz 2.1: Multi-domain dialogue state corrections and state tracking baselines. arXiv preprint arXiv:1907.01669, 2019.
    Findings
  • Shuyang Gao, Abhishek Sethi, Sanchit Aggarwal, Tagyoung Chung, and Dilek Hakkani-Tur. Dialog state tracking: A neural reading comprehension approach. arXiv preprint arXiv:1908.01946, 2019.
    Findings
  • Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. Constant-time machine translation with conditional masked language models. arXiv preprint arXiv:1904.09324, 2019.
    Findings
  • Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256, 2010.
    Google ScholarLocate open access versionFindings
  • Rahul Goel, Shachi Paul, and Dilek Hakkani-Tur. Hyst: A hybrid approach for flexible and accurate dialogue state tracking. Proc. Interspeech 2019, pp. 1458–1462, 2019.
    Google ScholarLocate open access versionFindings
  • Alex Graves, Santiago Fernandez, Faustino Gomez, and Jurgen Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning, pp. 369–376. ACM, 2006.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, and Richard Socher. Non-autoregressive neural machine translation. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=B1l8BtlCb.
    Locate open access versionFindings
  • Matthew Henderson, Blaise Thomson, and Jason D Williams. The second dialog state tracking challenge. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pp. 263–272, 2014a.
    Google ScholarLocate open access versionFindings
  • Matthew Henderson, Blaise Thomson, and Steve Young. Word-based dialog state tracking with recurrent neural networks. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pp. 292–299, 2014b.
    Google ScholarLocate open access versionFindings
  • Matthew Henderson, Blaise Thomson, and Steve J. Young. Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised adaptation. 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 360–365, 2014c.
    Google ScholarLocate open access versionFindings
  • Lukasz Kaiser, Samy Bengio, Aurko Roy, Ashish Vaswani, Niki Parmar, Jakob Uszkoreit, and Noam Shazeer. Fast decoding in sequence models using discrete latent variables. In International Conference on Machine Learning, pp. 2395–2404, 2018.
    Google ScholarLocate open access versionFindings
  • Diederick P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
    Google ScholarLocate open access versionFindings
  • Gakuto Kurata, Bing Xiang, Bowen Zhou, and Mo Yu. Leveraging sentence-level information with encoder LSTM for semantic slot filling. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2077–2083, Austin, Texas, November 2016. Association for Computational Linguistics. doi: 10.18653/v1/D16-1223. URL https://www.aclweb.org/anthology/D16-1223.
    Locate open access versionFindings
  • Hwaran Lee, Jinsik Lee, and Tae yoon Kim. Sumbt: Slot-utterance matching for universal and scalable belief tracking. In ACL, 2019.
    Google ScholarLocate open access versionFindings
  • Jason Lee, Elman Mansimov, and Kyunghyun Cho. Deterministic non-autoregressive neural sequence modeling by iterative refinement. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1173–1182, 2018.
    Google ScholarLocate open access versionFindings
  • Wenqiang Lei, Xisen Jin, Min-Yen Kan, Zhaochun Ren, Xiangnan He, and Dawei Yin. Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1437–1447, 2018.
    Google ScholarLocate open access versionFindings
  • Jindrich Libovickyand Jindrich Helcl. End-to-end non-autoregressive neural machine translation with connectionist temporal classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3016–3021, 2018.
    Google ScholarLocate open access versionFindings
  • Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1166. URL https://www.aclweb.org/anthology/D15-1166.
    Locate open access versionFindings
  • Nikola Mrksic, Diarmuid O Seaghdha, Tsung-Hsien Wen, Blaise Thomson, and Steve Young. Neural belief tracker: Data-driven dialogue state tracking. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1777– 1788. Association for Computational Linguistics, 2017. doi: 10.18653/v1/P17-1163. URL http://www.aclweb.org/anthology/P17-1163.
    Locate open access versionFindings
  • Elnaz Nouri and Ehsan Hosseini-Asl. Toward scalable neural dialogue state tracking model. arXiv preprint arXiv:1812.00899, 2018.
    Findings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
    Google ScholarFindings
  • Osman Ramadan, Paweł Budzianowski, and Milica Gasic. Large-scale multi-domain belief tracking with knowledge sharing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, volume 2, pp. 432–437, 2018.
    Google ScholarLocate open access versionFindings
  • Abhinav Rastogi, Dilek Z. Hakkani-Tur, and Larry P. Heck. Scalable multi-domain dialogue state tracking. 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 561–568, 2017.
    Google ScholarLocate open access versionFindings
  • Holger Schwenk. Continuous space translation models for phrase-based statistical machine translation. In Proceedings of COLING 2012: Posters, pp. 1071–1080, Mumbai, India, December 2012. The COLING 2012 Organizing Committee. URL https://www.aclweb.org/anthology/C12-2104.
    Locate open access versionFindings
  • Abigail See, Peter J Liu, and Christopher D Manning. Get to the point: Summarization with pointergenerator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1073–1083, 2017.
    Google ScholarLocate open access versionFindings
  • Iulian V Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
    Google ScholarLocate open access versionFindings
  • Yangyang Shi, Kaisheng Yao, Hu Chen, Dong Yu, Yi-Cheng Pan, and Mei-Yuh Hwang. Recurrent support vector machines for slot tagging in spoken language understanding. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 393–399, San Diego, California, June 2016. Association for Computational Linguistics. doi: 10.18653/v1/N16-1044. URL https://www.aclweb.org/anthology/N16-1044.
    Locate open access versionFindings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.
    Locate open access versionFindings
  • Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems 28, pp. 2692–2700. Curran Associates, Inc., 2015. URL http://papers.nips.cc/paper/5866-pointer-networks.pdf.
    Locate open access versionFindings
  • Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu. Non-autoregressive machine translation with auxiliary regularization. In The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI2019), pp. 5377–5384, 2019.
    Google ScholarLocate open access versionFindings
  • Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milica Gasic, Lina M. Rojas Barahona, PeiHao Su, Stefan Ultes, and Steve Young. A network-based end-to-end trainable task-oriented dialogue system. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 438–449, Valencia, Spain, April 2017. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/E17-1042.
    Locate open access versionFindings
  • Chien-Sheng Wu, Andrea Madotto, Ehsan Hosseini-Asl, Caiming Xiong, Richard Socher, and Pascale Fung. Transferable multi-domain state generator for task-oriented dialogue systems. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 808–819, Florence, Italy, July 2019. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/P19-1078.
    Locate open access versionFindings
  • Puyang Xu and Qi Hu. An end-to-end approach for handling unknown slot values in dialogue state tracking. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1448–1457. Association for Computational Linguistics, 2018. URL http://aclweb.org/anthology/P18-1134.
    Locate open access versionFindings
  • Victor Zhong, Caiming Xiong, and Richard Socher. Global-locally self-attentive encoder for dialogue state tracking. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1458–1467, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1135. URL https://www.aclweb.org/anthology/P18-1135.
    Locate open access versionFindings
  • We follow similar data preprocessing procedures as Budzianowski et al. (2018) and Wu et al. (2019) on both MultiWOZ 2.0 and 2.1. The resulting corpus includes 8,438 multi-turn dialogues in training set with an average of 13.5 turns per dialogue. For the test and validation set, each includes 1,000 multi-turn dialogues with an average of 14.7 turns per dialogue. The average number of domains per dialogue is 1.8 for training, validation, and test sets. The MultiWOZ corpus includes much larger ontology than previous DST datasets such as WOZ (Wen et al., 2017) and DSTC2 (Henderson et al., 2014a). We identified a total of 35 (domain, slot) pairs across 7 domains. However, only 5 domains are included in the test data. Refer to Table 7 for the statistics of dialogues in these 5 domains.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments