Open-Retrieval Conversational Question Answering

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020, pp. 539-548, 2020.

Cited by: 1|Bibtex|Views133|DOI:https://doi.org/10.1145/3397271.3401110
EI
Other Links: arxiv.org|dl.acm.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We introduce an open-retrieval conversational question answering setting as a further step towards conversational search

Abstract:

Conversational search is one of the ultimate goals of information retrieval. Recent research approaches conversational search by simplified settings of response ranking and conversational question answering, where an answer is either selected from a given candidate set or extracted from a given passage. These simplifications neglect the f...More

Code:

Data:

0
Introduction
  • Conversational search is an embodiment of an iterative and interactive information retrieval (IR) system that has been studied for decades [2, 10, 29].
  • A significant limitation of this setting is that an answer is either extracted from a given passage [34] or selected from a given candidate set [53]
  • This simplification neglects the fundamental role of retrieval in conversational search.
  • To address this issue, the authors introduce an open-retrieval ConvQA (ORConvQA) setting, where the authors learn to retrieve evidence from a large collection before extracting answers
Highlights
  • Conversational search is an embodiment of an iterative and interactive information retrieval (IR) system that has been studied for decades [2, 10, 29]
  • conversational QA (ConvQA) can be considered as a simplified setting of conversational search [33]
  • Our work extends ConvQA to an open-retrieval setting as another fundamental step towards conversational search
  • We present an end-to-end system that deals with the ORConvQA task described in Section 4.1
  • We introduce an open-retrieval conversational question answering (QA) setting as a further step towards conversational search
  • Our extensive experiments on OR-QuAC demonstrate that a learnable retriever is crucial in the ORConvQA setting
Results
  • The authors present the evaluation results, ablation studies on system components, and more analyses on history window size and the number of passages to fine-tune the retriever.
  • The authors report the main evaluation results in Table 3.
  • The authors tune the history window size w for all models that consider history and report their performances under the best history setting.
  • (1) The authors observe that DrQA has poor performance.
  • The main reason for this lies in the reader component.
  • The DrQA reader cannot handle unanswerable questions natively
Conclusion
  • CONCLUSIONS AND FUTURE WORK

    In this work, the authors introduce an open-retrieval conversational QA setting as a further step towards conversational search.
  • The authors build an end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader that are all based on Transformers.
  • The authors' extensive experiments on OR-QuAC demonstrate that a learnable retriever is crucial in the ORConvQA setting.
  • The authors further show that the system can make a substantial improvement when the authors enable history modeling in all system components.
  • The authors show that the additional reranker component contributes to the model performance by providing a regularization effect.
  • The authors will investigate more effective history modeling methods
Summary
  • Introduction:

    Conversational search is an embodiment of an iterative and interactive information retrieval (IR) system that has been studied for decades [2, 10, 29].
  • A significant limitation of this setting is that an answer is either extracted from a given passage [34] or selected from a given candidate set [53]
  • This simplification neglects the fundamental role of retrieval in conversational search.
  • To address this issue, the authors introduce an open-retrieval ConvQA (ORConvQA) setting, where the authors learn to retrieve evidence from a large collection before extracting answers
  • Results:

    The authors present the evaluation results, ablation studies on system components, and more analyses on history window size and the number of passages to fine-tune the retriever.
  • The authors report the main evaluation results in Table 3.
  • The authors tune the history window size w for all models that consider history and report their performances under the best history setting.
  • (1) The authors observe that DrQA has poor performance.
  • The main reason for this lies in the reader component.
  • The DrQA reader cannot handle unanswerable questions natively
  • Conclusion:

    CONCLUSIONS AND FUTURE WORK

    In this work, the authors introduce an open-retrieval conversational QA setting as a further step towards conversational search.
  • The authors build an end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader that are all based on Transformers.
  • The authors' extensive experiments on OR-QuAC demonstrate that a learnable retriever is crucial in the ORConvQA setting.
  • The authors further show that the system can make a substantial improvement when the authors enable history modeling in all system components.
  • The authors show that the additional reranker component contributes to the model performance by providing a regularization effect.
  • The authors will investigate more effective history modeling methods
Tables
  • Table1: Comparison of selected QA tasks on the dimensions of open-retrieval (OR), conversational (Conv), informationseeking (IS), and whether motivated by genuine information needs (GIN). The symbol “-” suggests a mixed situation
  • Table2: Data statistics of the OR-QuAC dataset
  • Table3: Main evaluation results. “Rt” and “Rr” refers to “Retriever” and “Reranker”. ‡ means statistically significant improvement over the strongest baseline with p < 0.05
  • Table4: Results of ablation studies. “Rt” and “Rr” refers to “Retriever” and “Reranker” respectively. ‡ and † means statistically significant performance decrease compared to the full system with p < 0.05 and p < 0.1 respectively
Download tables as Excel
Related work
  • Our work is closely related to several research topics, including QA, open domain QA, ConvQA, and conversational search. We mainly discuss retrieval based methods since they tend to offer more informative responses [53] and thus better fit for informationseeking tasks than generation based methods.

    Question Answering. One of the first modern reformulations of the QA task dates back to the TREC-8 Question Answering Track [46]. Its goal is to answer 200 fact-based, short-answer questions by leveraging a large collection of documents. A retrieval module is crucial in this task to retrieve relevant passages for answer extraction. As an increasing number of researchers in the natural language processing (NLP) community moving their focus to answer extraction and generation methods, the role of retrieval has been gradually overlooked. As a result, many popular QA tasks and datasets either follow an answer selection setting [16, 47, 57] or a machine comprehension setting [23, 35, 36, 44]. In real-world scenarios, it is less practical to assume we are given a small set of candidate answers or a gold passage. Therefore, in this work, we make the retrieval component as one of our focuses in the task formulation and model architecture.
Funding
  • This work was supported in part by the Center for Intelligent Information Retrieval, in part by NSF IIS-1715095, and in part by China Postdoctoral Science Foundation (No 2019M652038)
Reference
  • A. Ahmad, N. Constant, Y. Yang, and D. M. Cer. ReQA: An Evaluation for End-toEnd Answer Retrieval Models. ArXiv, 2019.
    Google ScholarLocate open access versionFindings
  • N. J. Belkin, C. Cool, A. Stein, and U. Thiel. Cases, Scripts, and Informationseeking Strategies: On the Design of Interactive Information Retrieval Systems. 1995.
    Google ScholarFindings
  • K. Bi, Q. Ai, Y. Zhang, and W. B. Croft. Conversational Product Search Based on Negative Feedback. In CIKM, 2019.
    Google ScholarLocate open access versionFindings
  • D. Chen, A. Fisch, J. Weston, and A. Bordes. Reading Wikipedia to Answer Open-Domain Questions. In ACL, 2017.
    Google ScholarLocate open access versionFindings
  • Y. Chen, L. Wu, and M. J. Zaki. GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension. ArXiv, 2019.
    Google ScholarLocate open access versionFindings
  • E. Choi, H. He, M. Iyyer, M. Yatskar, W.-T. Yih, Y. Choi, P. Liang, and L. Zettlemoyer. QuAC: Question Answering in Context. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • A. Chuklin, A. Severyn, J. R. Trippas, E. Alfonseca, H. Silén, and D. Spina. Prosody Modifications for Question-Answering in Voice-Only Settings. ArXiv, 2018.
    Google ScholarLocate open access versionFindings
  • C. Clark and M. Gardner. Simple and Effective Multi-Paragraph Reading Comprehension. In ACL, 2017.
    Google ScholarLocate open access versionFindings
  • D. Cohen, L. Yang, and W. B. Croft. WikiPassageQA: A Benchmark Collection for Research on Non-factoid Answer Passage Retrieval. In SIGIR, 2018.
    Google ScholarLocate open access versionFindings
  • W. B. Croft and R. H. Thompson. I3R: A New Approach to the Design of Document Retrieval Systems. JASIS, 38:389–404, 1987.
    Google ScholarLocate open access versionFindings
  • R. Das, S. Dhuliawala, M. Zaheer, and A. McCallum. Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT, 2019.
    Google ScholarLocate open access versionFindings
  • B. Dhingra, K. Mazaitis, and W. W. Cohen. Quasar: Datasets for Question Answering by Search and Reading. ArXiv, 2017.
    Google ScholarLocate open access versionFindings
  • M. Dunn, L. Sagun, M. Higgins, V. U. Güney, V. Cirik, and K. Cho. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine. ArXiv, 2017.
    Google ScholarLocate open access versionFindings
  • A. Elgohary, D. Peskov, and J. L. Boyd-Graber. Can You Unpack That? Learning to Rewrite Questions-in-Context. In EMNLP/IJCNLP, 2019.
    Google ScholarLocate open access versionFindings
  • S. Garg, T. Vu, and A. Moschitti. TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection. In AAAI, 2020.
    Google ScholarLocate open access versionFindings
  • P. M. Htut, S. R. Bowman, and K. Cho. Training a Ranking Function for OpenDomain Question Answering. In NAACL-HLT, 2018.
    Google ScholarLocate open access versionFindings
  • H.-Y. Huang, E. Choi, and W. tau Yih. Flowqa: Grasping flow in history for conversational machine comprehension. ArXiv, 2018.
    Google ScholarLocate open access versionFindings
  • J. Johnson, M. Douze, and H. Jégou. Billion-scale similarity search with GPUs. ArXiv, 2017.
    Google ScholarLocate open access versionFindings
  • M. Joshi, E. Choi, D. S. Weld, and L. Zettlemoyer. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In ACL, 2017.
    Google ScholarLocate open access versionFindings
  • V. Karpukhin, B. Ouguz, S. Min, L. Y. Wu, S. Edunov, D. Chen, and W. tau Yih. Dense Passage Retrieval for Open-Domain Question Answering. ArXiv, abs/2004.04906, 2020.
    Findings
  • B. Kratzwald and S. Feuerriegel. Adaptive Document Retrieval for Deep Question Answering. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. P. Parikh, C. Alberti, D. Epstein, I. Polosukhin, J. Devlin, K. Lee, K. Toutanova, L. Jones, M. Kelcey, M.-W. Chang, A. M. Dai, J. Uszkoreit, Q. Le, and S. Petrov. Natural Questions: A Benchmark for Question Answering Research. TACL, 7:453–466, 2019.
    Google ScholarLocate open access versionFindings
  • Z.-Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ArXiv, 2019.
    Google ScholarLocate open access versionFindings
  • J. Lee, S. Yun, H. Kim, M. Ko, and J. Kang. Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • K. Lee, M.-W. Chang, and K. Toutanova. Latent Retrieval for Weakly Supervised Open Domain Question Answering. In ACL, 2019.
    Google ScholarLocate open access versionFindings
  • R. Lowe, N. Pow, I. Serban, and J. Pineau. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems. In SIGDIAL, 2015.
    Google ScholarLocate open access versionFindings
  • T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, and L. Deng. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. ArXiv, 2016.
    Google ScholarLocate open access versionFindings
  • R. N. Oddy. Information Retrieval through Man-Machine Dialogue. 1977.
    Google ScholarFindings
  • C. Qu, L. Yang, W. B. Croft, J. R. Trippas, Y. Zhang, and M. Qiu. Analyzing and Characterizing User Intent in Information-seeking Conversations. In SIGIR, 2018.
    Google ScholarLocate open access versionFindings
  • C. Qu, L. Yang, W. B. Croft, F. Scholer, and Y. Zhang. Answer Interaction in Non-factoid Question Answering Systems. In CHIIR, 2019.
    Google ScholarLocate open access versionFindings
  • C. Qu, L. Yang, W. B. Croft, Y. Zhang, J. R. Trippas, and M. Qiu. User Intent Prediction in Information-seeking Conversations. In CHIIR, 2019.
    Google ScholarLocate open access versionFindings
  • C. Qu, L. Yang, M. Qiu, W. B. Croft, Y. Zhang, and M. Iyyer. BERT with History Answer Embedding for Conversational Question Answering. In SIGIR, 2019.
    Google ScholarLocate open access versionFindings
  • C. Qu, L. Yang, M. Qiu, Y. Zhang, C. Chen, W. B. Croft, and M. Iyyer. Attentive History Selection for Conversational Question Answering. In CIKM, 2019.
    Google ScholarLocate open access versionFindings
  • P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In EMNLP, 2016.
    Google ScholarLocate open access versionFindings
  • P. Rajpurkar, R. Jia, and P. Liang. Know What You Don’t Know: Unanswerable Questions for SQuAD. In ACL, 2018.
    Google ScholarLocate open access versionFindings
  • S. Reddy, D. Chen, and C. D. Manning. CoQA: A Conversational Question Answering Challenge. TACL, 7:249–266, 2018.
    Google ScholarLocate open access versionFindings
  • A. Shrivastava and P. Li. Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS). In NIPS, 2014.
    Google ScholarLocate open access versionFindings
  • C. Tao, W. Wu, C. Xu, W. Hu, D. Zhao, and R. Yan. Multi-Representation Fusion Network for Multi-Turn Response Selection in Retrieval-Based Chatbots. In WSDM, 2019.
    Google ScholarLocate open access versionFindings
  • P. Thomas, D. J. McDuff, M. Czerwinski, and N. Craswell. MISC: A data set of information-seeking conversations. In SIGIR (CAIR’17), 2017.
    Google ScholarLocate open access versionFindings
  • J. R. Trippas, D. Spina, L. Cavedon, and M. Sanderson. How Do People Interact in Conversational Speech-Only Search Tasks: A Preliminary Analysis. In CHIIR, 2017.
    Google ScholarLocate open access versionFindings
  • J. R. Trippas, D. Spina, L. Cavedon, H. Joho, and M. Sanderson. Informing the Design of Spoken Conversational Search: Perspective Paper. In CHIIR, 2018.
    Google ScholarLocate open access versionFindings
  • J. R. Trippas, D. Spina, P. Thomas, M. Sanderson, H. Joho, and L. Cavedon. Towards a Model for Spoken Conversational Search. ArXiv, 2019.
    Google ScholarLocate open access versionFindings
  • A. Trischler, T. Wang, X. Yuan, J. Harris, A. Sordoni, P. Bachman, and K. Suleman. NewsQA: A Machine Comprehension Dataset. In Rep4NLP@ACL, 2016.
    Google ScholarLocate open access versionFindings
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention Is All You Need. In NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • E. M. Voorhees and D. M. Tice. The TREC-8 Question Answering Track Evaluation. In TREC, 1999.
    Google ScholarLocate open access versionFindings
  • M. Wang, N. A. Smith, and T. Mitamura. What is the Jeopardy Model? A QuasiSynchronous Grammar for QA. In EMNLP-CoNLL, 2007.
    Google ScholarFindings
  • S. Wang, M. Yu, X. Guo, Z. Wang, T. Klinger, W. Zhang, S. Chang, G. Tesauro, B. Zhou, and J. Jiang. R3: Reinforced Ranker-Reader for Open-Domain Question Answering. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • Y. Wu, W. Y. Wu, M. Zhou, and Z. Li. Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots. In ACL, 2016.
    Google ScholarLocate open access versionFindings
  • R. Yan, Y. Song, and H. Wu. Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System. In SIGIR, 2016.
    Google ScholarLocate open access versionFindings
  • R. Yan, Y. Song, X. Zhou, and H. Wu. "Shall I Be Your Chat Companion?": Towards an Online Human-Computer Conversation System. In CIKM, 2016.
    Google ScholarLocate open access versionFindings
  • L. Yang, H. Zamani, Y. Zhang, J. Guo, and W. B. Croft. Neural Matching Models for Question Retrieval and Next Question Prediction in Conversation. ArXiv, 2017.
    Google ScholarLocate open access versionFindings
  • L. Yang, M. Qiu, C. Qu, J. Guo, Y. Zhang, W. B. Croft, J. Huang, and H. Chen. Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems. In SIGIR, 2018.
    Google ScholarLocate open access versionFindings
  • L. Yang, J. Hu, M. Qiu, C. Qu, J. Gao, W. B. Croft, X. Liu, Y. Shen, and J. Liu. A Hybrid Retrieval-Generation Neural Conversation Model. In CIKM, 2019.
    Google ScholarLocate open access versionFindings
  • L. Yang, M. Qiu, C. Qu, C. Chen, J. Guo, Y. Zhang, W. B. Croft, and H. Chen. IART: Intent-aware Response Ranking with Transformers in Information-seeking Conversation Systems. In WWW, 2020.
    Google ScholarLocate open access versionFindings
  • W. Yang, Y. Xie, A. Lin, X. Li, L. Tan, K. Xiong, M. Li, and J. Lin. End-to-End Open-Domain Question Answering with BERTserini. In NAACL-HLT, 2019.
    Google ScholarLocate open access versionFindings
  • Y. Yang, W.-T. Yih, and C. Meek. WikiQA: A Challenge Dataset for Open-Domain Question Answering. In EMNLP, 2015.
    Google ScholarLocate open access versionFindings
  • M. Yatskar. A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC. In NAACL-HLT, 2018.
    Google ScholarLocate open access versionFindings
  • Y.-T. Yeh and Y.-N. Chen. FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension. ArXiv, 2019.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, X. Chen, Q. Ai, L. Yang, and W. B. Croft. Towards Conversational Search and Recommendation: System Ask, User Respond. In CIKM, 2018.
    Google ScholarLocate open access versionFindings
  • C. Zhu, M. Zeng, and X. Huang. SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering. ArXiv, 2018.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments