AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a joint intent classification and slot filling model based on Bidirectional Encoder Representations from Transformers

BERT for Joint Intent Classification and Slot Filling.

arXiv: Computation and Language, (2019)

Cited by: 0|Views58
EI
Full Text
Bibtex
Weibo

Abstract

Intent classification and slot filling are two essential tasks for natural language understanding. They often suffer from small-scale human-labeled training data, resulting in poor generalization capability, especially for rare words. Recently a new language representation model, BERT (Bidirectional Encoder Representations from Transforme...More

Code:

Data:

0
Introduction
  • A variety of smart speakers have been deployed and achieved great success, such as Google Home, Amazon Echo, Tmall Genie, which facilitate goal-oriented dialogues and help users to accomplish their tasks through voice interactions.
  • Natural language understanding (NLU) is critical to the performance of goal-oriented spoken dialogue systems.
  • NLU typically includes the intent classification and slot filling tasks, aiming to form a semantic parse for user utterances.
Highlights
  • In recent years, a variety of smart speakers have been deployed and achieved great success, such as Google Home, Amazon Echo, Tmall Genie, which facilitate goal-oriented dialogues and help users to accomplish their tasks through voice interactions
  • Intent classification focuses on predicting the intent of the query, while slot filling extracts semantic concepts
  • The technical contributions in this work are two folds: 1) we explore the Bidirectional Encoder Representations from Transformers (BERT) pre-trained model to address the poor generalization capability of Natural language understanding (NLU); 2) we propose a joint intent classification and slot filling model based on BERT and demonstrate that the proposed model achieves significant improvement on intent classification accuracy, slot filling F1, and sentence-level semantic frame accuracy on several public benchmark datasets, compared to attentionbased Recurrent neural network (RNN) models and slot-gated models
  • On Snips, joint BERT achieves intent classification accuracy of 98.6%, slot filling F1 of 97.0%, and sentence-level semantic frame accuracy of 92.8%
  • We propose a joint intent classification and slot filling model based on BERT, aiming at addressing the poor generalization capability of traditional NLU models
  • Our proposed joint BERT model achieves significant improvement on intent classification accuracy, slot filling F1, and sentence-level semantic frame accuracy on ATIS and Snips datasets over previous state-of-the-art models
Methods
  • Experiments and Analysis

    The authors evaluate the proposed model on two public benchmark datasets, ATIS and Snips. 4.1 Data

    The ATIS dataset (Tur et al, 2010) is widely used in NLU research, which includes audio recordings of people making flight reservations.
  • The authors evaluate the proposed model on two public benchmark datasets, ATIS and Snips.
  • The authors use the same data division as Goo et al (2018) for both datasets.
  • The training, development and test sets contain 13,084, 700 and 700 utterances, respectively.
  • There are 72 slot labels and 7 intent types for the training set
Results
  • Model

    Epochs Intent Slot No joint Joint BERT 5

    Table 2 shows the model performance as slot filling F1, intent classification accuracy, and sentence-level semantic frame accuracy on the Snips and ATIS datasets.

    1https://github.com/google-research/bert

    Query need to see mother joan of the angels in one second Gold, predicted by joint BERT correctly Intent SearchScreeningEvent Slots O O O B-movie-name I-movie-name I-movie-name I-movie-name I-movie-name B-timeRange

    I-timeRange I-timeRange Predicted by Slot-Gated Model (Goo et al, 2018) Intent BookRestaurant Slots O O O B-object-name I-object-name I-object-name I-object-name I-object-name B-timeRange

    I-timeRange I-timeRange

    The first group of models are the baselines and it consists of the state-of-the-art joint intent classification and slot filling models: sequencebased joint model using BiLSTM (Hakkani-Tur et al, 2016), attention-based model (Liu and Lane, 2016), and slot-gated model (Goo et al, 2018).

    The second group of models includes the proposed joint BERT models.
  • Table 2 shows the model performance as slot filling F1, intent classification accuracy, and sentence-level semantic frame accuracy on the Snips and ATIS datasets.
  • On Snips, joint BERT achieves intent classification accuracy of 98.6%, slot filling F1 of 97.0%, and sentence-level semantic frame accuracy of 92.8%.
  • On ATIS, joint BERT achieves intent classification accuracy of 97.5%, slot filling F1 of 96.1%, and sentence-level semantic frame accuracy of 88.2%.
  • Joint BERT+CRF replaces the softmax classifier with CRF and it performs comparably to BERT, probably due to the self-attention mechanism in Transformer, which may have sufficiently modeled the label structures
Conclusion
  • The authors propose a joint intent classification and slot filling model based on BERT, aiming at addressing the poor generalization capability of traditional NLU models.
  • Experimental results show that the proposed joint BERT model outperforms BERT models modeling intent classification and slot filling separately, demonstrating the efficacy of exploiting the relationship between the two tasks.
  • The authors' proposed joint BERT model achieves significant improvement on intent classification accuracy, slot filling F1, and sentence-level semantic frame accuracy on ATIS and Snips datasets over previous state-of-the-art models.
  • Future work includes evaluations of the proposed approach on other large-scale and more complex NLU datasets, and exploring the efficacy of combining external knowledge with BERT
Summary
  • Introduction:

    A variety of smart speakers have been deployed and achieved great success, such as Google Home, Amazon Echo, Tmall Genie, which facilitate goal-oriented dialogues and help users to accomplish their tasks through voice interactions.
  • Natural language understanding (NLU) is critical to the performance of goal-oriented spoken dialogue systems.
  • NLU typically includes the intent classification and slot filling tasks, aiming to form a semantic parse for user utterances.
  • Methods:

    Experiments and Analysis

    The authors evaluate the proposed model on two public benchmark datasets, ATIS and Snips. 4.1 Data

    The ATIS dataset (Tur et al, 2010) is widely used in NLU research, which includes audio recordings of people making flight reservations.
  • The authors evaluate the proposed model on two public benchmark datasets, ATIS and Snips.
  • The authors use the same data division as Goo et al (2018) for both datasets.
  • The training, development and test sets contain 13,084, 700 and 700 utterances, respectively.
  • There are 72 slot labels and 7 intent types for the training set
  • Results:

    Model

    Epochs Intent Slot No joint Joint BERT 5

    Table 2 shows the model performance as slot filling F1, intent classification accuracy, and sentence-level semantic frame accuracy on the Snips and ATIS datasets.

    1https://github.com/google-research/bert

    Query need to see mother joan of the angels in one second Gold, predicted by joint BERT correctly Intent SearchScreeningEvent Slots O O O B-movie-name I-movie-name I-movie-name I-movie-name I-movie-name B-timeRange

    I-timeRange I-timeRange Predicted by Slot-Gated Model (Goo et al, 2018) Intent BookRestaurant Slots O O O B-object-name I-object-name I-object-name I-object-name I-object-name B-timeRange

    I-timeRange I-timeRange

    The first group of models are the baselines and it consists of the state-of-the-art joint intent classification and slot filling models: sequencebased joint model using BiLSTM (Hakkani-Tur et al, 2016), attention-based model (Liu and Lane, 2016), and slot-gated model (Goo et al, 2018).

    The second group of models includes the proposed joint BERT models.
  • Table 2 shows the model performance as slot filling F1, intent classification accuracy, and sentence-level semantic frame accuracy on the Snips and ATIS datasets.
  • On Snips, joint BERT achieves intent classification accuracy of 98.6%, slot filling F1 of 97.0%, and sentence-level semantic frame accuracy of 92.8%.
  • On ATIS, joint BERT achieves intent classification accuracy of 97.5%, slot filling F1 of 96.1%, and sentence-level semantic frame accuracy of 88.2%.
  • Joint BERT+CRF replaces the softmax classifier with CRF and it performs comparably to BERT, probably due to the self-attention mechanism in Transformer, which may have sufficiently modeled the label structures
  • Conclusion:

    The authors propose a joint intent classification and slot filling model based on BERT, aiming at addressing the poor generalization capability of traditional NLU models.
  • Experimental results show that the proposed joint BERT model outperforms BERT models modeling intent classification and slot filling separately, demonstrating the efficacy of exploiting the relationship between the two tasks.
  • The authors' proposed joint BERT model achieves significant improvement on intent classification accuracy, slot filling F1, and sentence-level semantic frame accuracy on ATIS and Snips datasets over previous state-of-the-art models.
  • Future work includes evaluations of the proposed approach on other large-scale and more complex NLU datasets, and exploring the efficacy of combining external knowledge with BERT
Tables
  • Table1: An example from user query to semantic frame
  • Table2: NLU performance on Snips and ATIS datasets. The metrics are intent classification accuracy, slot filling F1, and sentence-level semantic frame accuracy (%). The results for the first group of models are cited from <a class="ref-link" id="cGoo_et+al_2018_a" href="#rGoo_et+al_2018_a">Goo et al (2018</a>)
  • Table3: Ablation Analysis for the Snips dataset
  • Table4: A case in the Snips dataset
Download tables as Excel
Related work
Funding
  • On Snips, joint BERT achieves intent classification accuracy of 98.6% (from 97.0%), slot filling F1 of 97.0% (from 88.8%), and sentence-level semantic frame accuracy of 92.8% (from 75.5%)
  • On ATIS, joint BERT achieves intent classification accuracy of 97.5% (from 94.1%), slot filling F1 of 96.1% (from 95.2%), and sentence-level semantic frame accuracy of 88.2% (from 82.6%)
  • Without joint learning, the accuracy of intent classification drops to 98.0% (from
  • 98.6%), and the slot filling F1 drops to 95.8% (from 97.0%)
Reference
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473.
    Findings
  • Alice Coucke, Alaa Saade, Adrien Ball, Theodore Bluche, Alexandre Caulier, David Leroy, Clement Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, Mael Primet, and Joseph Dureau. 2018. Snips voice platform: an embedded spoken language understanding system for privateby-design voice interfaces. CoRR, abs/1805.10190.
    Findings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.
    Findings
  • Chih-Wen Goo, Guang Gao, Yun-Kai Hsu, Chih-Li Huo, Tsung-Chieh Chen, Keng-Wei Hsu, and YunNung Chen. 2018. Slot-gated modeling for joint slot filling and intent prediction. In NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers), pages 753–757.
    Google ScholarLocate open access versionFindings
  • Daniel Guo, Gokhan Tur, Wen-tau Yih, and Geoffrey Zweig. 2014. Joint semantic utterance classification and slot filling with recursive neural networks. In 2014 IEEE Spoken Language Technology Workshop, SLT 2014, South Lake Tahoe, NV, USA, December 710, 2014, pages 554–559.
    Google ScholarLocate open access versionFindings
  • Dilek Hakkani-Tur, Gokhan Tur, Asli Celikyilmaz, Yun-Nung Chen, Jianfeng Gao, Li Deng, and YeYi Wang. 201Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In Interspeech 2016, San Francisco, CA, USA, September 8-12, 2016, pages 715–719.
    Google ScholarFindings
  • Yoon Kim. 2014. Convolutional neural networks for sentence classification. In EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1746– 1751. ACL.
    Google ScholarFindings
  • Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR, abs/1412.6980.
    Findings
  • Gakuto Kurata, Bing Xiang, Bowen Zhou, and Mo Yu. 2016. Leveraging sentence-level information with encoder LSTM for natural language understanding. CoRR, abs/1601.01530.
    Findings
  • Bing Liu and Ian Lane. 2016. Attention-based recurrent neural network models for joint intent detection and slot filling. In Interspeech 2016, San Francisco, CA, USA, September 8-12, 2016, pages 685–689.
    Google ScholarFindings
  • Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. In ACL 2017, Vancouver, Canada, July 30 August 4, Volume 1: Long Papers, pages 1–10. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Baolin Peng, Kaisheng Yao, Li Jing, and Kam-Fai Wong. 2015. Recurrent neural networks with external memory for spoken language understanding. In NLPCC 2015, Nanchang, China, October 9-13, 2015, Proceedings, pages 25–35.
    Google ScholarLocate open access versionFindings
  • Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), pages 2227–2237.
    Google ScholarFindings
  • Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding with unsupervised learning. In Technical report, OpenAI.
    Google ScholarLocate open access versionFindings
  • Suman V. Ravuri and Andreas Stolcke. 20Recurrent neural network and LSTM models for lexical utterance classification. In INTERSPEECH 2015, Dresden, Germany, September 6-10, 2015, pages 135–139. ISCA.
    Google ScholarFindings
  • Gokhan Tur, Dilek Hakkani-Tur, and Larry P. Heck. 2010. What is left to be understood in atis? In 2010 IEEE Spoken Language Technology Workshop, SLT 2010, Berkeley, California, USA, December 12-15, 2010, pages 19–24.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 20Attention is all you need. In NIPS 2017, 4-9 December 2017, Long Beach, CA, USA, pages 6000–6010.
    Google ScholarFindings
  • Ngoc Thang Vu. 2016. Sequential convolutional neural networks for slot filling in spoken language understanding. In Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 812, 2016, pages 3250–3254. ISCA.
    Google ScholarLocate open access versionFindings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144.
    Findings
  • Puyang Xu and Ruhi Sarikaya. 2013. Convolutional neural network based triangular CRF for joint intent detection and slot filling. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, December 812, 2013, pages 78–83.
    Google ScholarLocate open access versionFindings
  • Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J. Smola, and Eduard H. Hovy. 2016. Hierarchical attention networks for document classification. In NAACL HLT 2016, San Diego California, USA, June 12-17, 2016, pages 1480–1489. The Association for Computational Linguistics.
    Google ScholarFindings
  • Kaisheng Yao, Baolin Peng, Yu Zhang, Dong Yu, Geoffrey Zweig, and Yangyang Shi. 2014. Spoken language understanding using long short-term memory neural networks. In 2014 IEEE Spoken Language Technology Workshop, SLT 2014, South Lake Tahoe, NV, USA, December 7-10, 2014, pages 189–194.
    Google ScholarLocate open access versionFindings
  • Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In NIPS 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 649–657.
    Google ScholarLocate open access versionFindings
  • Lin Zhao and Zhe Feng. 2018. Improving slot filling in spoken language understanding with joint pointer and attention. In ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers, pages 426–431.
    Google ScholarLocate open access versionFindings
  • Zhiwei Zhao and Youzheng Wu. 2016. Attentionbased convolutional neural networks for sentence classification. In Interspeech 2016, San Francisco, CA, USA, September 8-12, 2016, pages 705–709. ISCA.
    Google ScholarFindings
  • Jie Zhou and Wei Xu. 2015. End-to-end learning of semantic role labeling using recurrent neural networks. In ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pages 1127–1137.
    Google ScholarLocate open access versionFindings
  • Yukun Zhu, Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pages 19– 27.
    Google ScholarLocate open access versionFindings
Author
Qian Chen
Qian Chen
Zhu Zhuo
Zhu Zhuo
Wen Wang
Wen Wang
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科