Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering

ACL, pp. 5709-5714, 2020.

Cited by: 0|Views43
EI
Weibo:
This paper introduces a novel transformer approach that effectively interprets hierarchical contexts in multiparty dialogue by learning utterance embeddings

Abstract:

We introduce a novel approach to transformers that learns hierarchical representations in multiparty dialogue. First, three language modeling tasks are used to pre-train the transformers, token- and utterance-level language modeling and utterance order prediction, that learn both token and utterance embeddings for better understanding i...More
Full Text
Bibtex
Weibo
Introduction
Highlights
  • Transformer-based contextualized embedding approaches such as BERT (Devlin et al, 2019), XLM (CONNEAU and Lample, 2019), XLNet (Yang et al, 2019), RoBERTa (Liu et al, 2019), and AlBERT (Lan et al, 2019) have re-established the state-of-the-art for practically all question answering (QA) tasks on general domain datasets such as SQUAD (Rajpurkar et al, 2016, 2018), MS MARCO (Nguyen et al, 2016), TRIVIAQA (Joshi et al, 2017), NEWSQA (Trischler et al, 2017), or NARRATIVEQA (Koisk et al, 2018), and multiturn question datasets such as SQA (Iyyer et al, 2017), QUAC (Choi et al, 2018), COQA (Reddy et al, 2019), or CQA (Talmor and Berant, 2018)
  • For span-based question answering where the evidence documents are in the form of multiparty dialogue, the performance is still poor even with the latest transformer models (Sun et al, 2019; Yang and Choi, 2019) due to the challenges in representing utterances composed by heterogeneous speakers
  • Unlike sentences in a wiki or news article written by one author with a coherent topic, utterances in a dialogue are from multiple speakers who may talk about different topics in distinct manners such that they should not be represented by concatenating, but rather as sub-documents interconnected to one another
  • This paper presents a novel approach to the latest transformers that learns hierarchical embeddings for tokens and utterances for a better understanding in dialogue contexts
  • This paper introduces a novel transformer approach that effectively interprets hierarchical contexts in multiparty dialogue by learning utterance embeddings
  • We will evaluate our approach on other machine comprehension tasks using dialogues as evidence documents to further verify the generalizability of this work
Methods
  • 3.1 Corpus

    Despite of all great work in QA, only two datasets are publicly available for machine comprehension that take dialogues as evidence documents.
  • The other is FRIENDSQA containing transcripts from the TV show Friends with annotation for spanbased question answering (Yang and Choi, 2019).
  • Since DREAM is for a reading comprehension task that does not need to find the answer contents from the evidence documents, it is not suitable for the approach; FRIENDSQA is chosen.
  • Yang and Choi (2019) randomly split the corpus to generate training, development, and evaluation sets such that scenes from the same episode can be distributed across those three sets, causing inflated accuracy scores.
  • For pre-training (§2.1), all transcripts from Seasons 5-10 are used as an additional training set
Results
  • Table 2 shows results achieved by all the models.
  • The performance of RoBERTa* is generally higher than BERT* RoBERTabase is pre-trained with larger datasets including CC-NEWS (Nagel, 2016), OPENWEBTEXT (Gokaslan and Cohen, 2019), and STORIES (Trinh and Le, 2018) than BERTbase such that results from those two types of transformers cannot be directly compared.
  • BERT BERTpre BERTour RoBERTa RoBERTapre RoBERTaour EM
Conclusion
  • This paper introduces a novel transformer approach that effectively interprets hierarchical contexts in multiparty dialogue by learning utterance embeddings.
  • Two language modeling approaches are proposed, utterance-level masked LM and utterance order prediction.
  • Coupled with the joint inference between token span prediction and utterance ID prediction, these two language models significantly outperform two of the state-of-the-art transformer approaches, BERT and RoBERTa, on a span-based QA task called FriendsQA.
  • The authors will evaluate the approach on other machine comprehension tasks using dialogues as evidence documents to further verify the generalizability of this work
Summary
  • Introduction:

    Transformer-based contextualized embedding approaches such as BERT (Devlin et al, 2019), XLM (CONNEAU and Lample, 2019), XLNet (Yang et al, 2019), RoBERTa (Liu et al, 2019), and AlBERT (Lan et al, 2019) have re-established the state-of-the-art for practically all question answering (QA) tasks on general domain datasets such as SQUAD (Rajpurkar et al, 2016, 2018), MS MARCO (Nguyen et al, 2016), TRIVIAQA (Joshi et al, 2017), NEWSQA (Trischler et al, 2017), or NARRATIVEQA (Koisk et al, 2018), and multiturn question datasets such as SQA (Iyyer et al, 2017), QUAC (Choi et al, 2018), COQA (Reddy et al, 2019), or CQA (Talmor and Berant, 2018).
  • For span-based QA where the evidence documents are in the form of multiparty dialogue, the performance is still poor even with the latest transformer models (Sun et al, 2019; Yang and Choi, 2019) due to the challenges in representing utterances composed by heterogeneous speakers.
  • Several limitations can be expected for language models trained on general domains to process dialogue
  • Most of these models are pre-trained on formal writing, which is notably different from colloquial writing in dialogue; fine-tuning for the end tasks is often not sufficient enough to build robust dialogue models.
  • Methods:

    3.1 Corpus

    Despite of all great work in QA, only two datasets are publicly available for machine comprehension that take dialogues as evidence documents.
  • The other is FRIENDSQA containing transcripts from the TV show Friends with annotation for spanbased question answering (Yang and Choi, 2019).
  • Since DREAM is for a reading comprehension task that does not need to find the answer contents from the evidence documents, it is not suitable for the approach; FRIENDSQA is chosen.
  • Yang and Choi (2019) randomly split the corpus to generate training, development, and evaluation sets such that scenes from the same episode can be distributed across those three sets, causing inflated accuracy scores.
  • For pre-training (§2.1), all transcripts from Seasons 5-10 are used as an additional training set
  • Results:

    Table 2 shows results achieved by all the models.
  • The performance of RoBERTa* is generally higher than BERT* RoBERTabase is pre-trained with larger datasets including CC-NEWS (Nagel, 2016), OPENWEBTEXT (Gokaslan and Cohen, 2019), and STORIES (Trinh and Le, 2018) than BERTbase such that results from those two types of transformers cannot be directly compared.
  • BERT BERTpre BERTour RoBERTa RoBERTapre RoBERTaour EM
  • Conclusion:

    This paper introduces a novel transformer approach that effectively interprets hierarchical contexts in multiparty dialogue by learning utterance embeddings.
  • Two language modeling approaches are proposed, utterance-level masked LM and utterance order prediction.
  • Coupled with the joint inference between token span prediction and utterance ID prediction, these two language models significantly outperform two of the state-of-the-art transformer approaches, BERT and RoBERTa, on a span-based QA task called FriendsQA.
  • The authors will evaluate the approach on other machine comprehension tasks using dialogues as evidence documents to further verify the generalizability of this work
Tables
  • Table1: New data split for FriendsQA. D/Q/A: # of dialogues/questions/answers, E: episode IDs
  • Table2: Accuracies (± standard deviations) achieved by the BERT and RoBERTa models
  • Table3: Results from the RoBERTaour model by different question types
  • Table4: Results for the ablation studies. Note that the *uid⊕ULM⊕UOP models are equivalent to the *our models in Table 2, respectively
  • Table5: Error types and their ratio with respect to the three most challenging question types
  • Table6: Results from RoBERTa by question types
  • Table7: Results from RoBERTapre by question types
  • Table8: Results from RoBERTaour by question types
  • Table9: Results from BERT by question types
  • Table10: Results from BERTpre by question types
  • Table11: Results from BERTour by question types
  • Table12: An error example for the why question (Q). J: Joey, R: Rachel, P: Pheobe, M: Monica
  • Table13: An error example for the who question (Q). P: Pheobe, R: Ross, S: Susan
  • Table14: An error example for the how question (Q). J: Joey, G: The Girl
Download tables as Excel
Funding
  • We gratefully acknowledge the support of the AWS Machine Learning Research Awards (MLRA)
Reference
  • Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wentau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. Quac: Question answering in context. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Alexis CONNEAU and Guillaume Lample. 2019. Cross-lingual language model pretraining. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alche-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 7057–7067. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL’19, pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Aaron Gokaslan and Vanya Cohen. 2019. OpenWebText Corpus.
    Google ScholarFindings
  • Mohit Iyyer, Wen-tau Yih, and Ming-Wei Chang. 2017. Search-based neural structured learning for sequential question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1821–1831, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. 2017. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
    Google ScholarLocate open access versionFindings
  • Tom Koisk, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gbor Melis, and Edward Grefenstette. 2018. The narrativeqa reading comprehension challenge. Transactions of the Association for Computational Linguistics, 6:317328.
    Google ScholarLocate open access versionFindings
  • Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations.
    Google ScholarFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 201RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv, 1907.11692.
    Findings
  • Sebastian Nagel. 2016. News Dataset Available.
    Google ScholarLocate open access versionFindings
  • Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. In Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016.
    Google ScholarLocate open access versionFindings
  • Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you dont know: Unanswerable questions for squad. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
    Google ScholarLocate open access versionFindings
  • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7:249266.
    Google ScholarLocate open access versionFindings
  • Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin Choi, and Claire Cardie. 2019. DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension. Transactions of the Association for Computational Linguistics, 7:217–231.
    Google ScholarLocate open access versionFindings
  • Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering complex questions. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers).
    Google ScholarLocate open access versionFindings
  • Trieu H. Trinh and Quoc V. Le. 2018. A Simple Method for Commonsense Reasoning. arXiv, 1806.02847.
    Findings
  • Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. 2017. Newsqa: A machine comprehension dataset. Proceedings of the 2nd Workshop on Representation Learning for NLP.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 6000–6010, USA. Curran Associates Inc.
    Google ScholarLocate open access versionFindings
  • Zhengzhe Yang and Jinho D. Choi. 2019. FriendsQA: Open-domain question answering on TV show transcripts. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, pages 188–197, Stockholm, Sweden. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alche-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 5754– 5764. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments