AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically

Go Generating

AI Traceability

AI parses the academic lineage of this thesis

Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper

We introduced AMBIGQA, a new task that involves providing multiple possible answers to a potentially ambiguous open-domain question, and providing a disambiguated question corresponding to each answer

AmbigQA: Answering Ambiguous Open domain Questions

EMNLP 2020, pp.5783-5797, (2020)

Cited by: 16|Views388
Full Text


Ambiguity is inherent to open-domain question answering; especially when exploring new topics, it can be difficult to ask questions that have a single, unambiguous answer. In this paper, we introduce AmbigQA, a new open-domain question answering task which involves finding every plausible answer, and then rewriting the question for each o...More



  • In the open-domain setting, it can be difficult to formulate clear and unambiguous questions.
  • Open-domain question answering (QA) systems aim to answer any factoid question (Voorhees et al, 1999), existing methods assume all questions have a single well defined answer.
  • The authors define the task of ambiguous question answering, and present the first data and baseline methods for the study of how to disambiguate and answer such questions.
  • Ambiguity arises frequently in open-domain QA, where questions are written during information gathering without knowledge of the answer.
  • In the open-domain setting, it can be difficult to formulate clear and unambiguous questions
  • We extend a state-of-the-art model for NQ-OPEN (Karpukhin et al, 2020) with three new components: (1) set-based question answering with a BART-based sequence-to-sequence model (Lewis et al, 2020), (2) a question disambiguation model based on BART, and (3) a modification to democratic co-training (Zhou and Goldman, 2004) which allows this model to leverage the partial supervision available in the full NQ-OPEN dataset
  • We introduce AMBIGQA, a new task which requires identifying all plausible answers to an open-domain question, along with disambiguated questions to differentiate them
  • We introduced AMBIGQA, a new task that involves providing multiple possible answers to a potentially ambiguous open-domain question, and providing a disambiguated question corresponding to each answer
  • Our analysis shows the dataset contains diverse types of ambiguity, often not visible by the prompt question alone but only found upon reading evidence documents
  • The authors describe the baseline models used in the experiments, followed by results and ablations.
  • This baseline disambiguates the prompt question without any context from plausible answers or reference passages.
  • It implements the following pipeline: (1) Feed the prompt question q into a BERT-based binary classifier to determine whether it is ambiguous.
  • The authors include a model based on Karpukhin et al (2020), with thresholding for multiple answer prediction and the BARTbased question disambiguation (QD) model.
  • The authors produce disambiguated questions using the BART-based QD model (Section 5)
  • In example 1 of Table 6, it asks about filming in 2017 and during season 1 for Snow White and the Huntsman, which was a film released in 2012.
  • This shows that reading evidence documents is crucial for identifying and characterizing ambiguities
  • The authors introduced AMBIGQA, a new task that involves providing multiple possible answers to a potentially ambiguous open-domain question, and providing a disambiguated question corresponding to each answer.
  • The authors introduced a first baseline model for producing multiple answers to open-domain questions, with experiments showing its effectiveness in learning from the data while highlighting avenues for future work.
  • Future work may investigate (1) more effective ways of dealing with highly ambiguous questions, (2) providing information related to the inferred information need when no answers are found, or (3) dealing with ill-formed questions
  • Table1: Breakdown of the types of ambiguity based on 100 random samples on AMBIGNQ development data
  • Table2: Data statistics. For the number of QA pairs (# QAs), the minimum is taken when there are more than 1 accepted annotations
  • Table3: Result on the development and test data. all and multi indicate all examples and examples with multiple question-answer pairs only, respectively. QD indicates a question disambiguation model. † indicates ensemble
  • Table4: Ablations on question disambiguation (development data, multiple answers only). QD model refers to the question disambiguation model described in Section 5. For multiple answer prediction, we use SPANSEQGEN†
  • Table5: Zero-shot performance on multiple answer prediction of the models trained on NQ-OPEN. We report Exact Match (EM) on NQ-OPEN and F1ans on AMBIGNQ
  • Table6: Model predictions on samples from the development data. (#1) DISAMBIG-FIRST generates questions that look reasonable on the surface but don’t match the facts. SPANSEQGEN produces the reasonable answers and questions, although not perfect. (#2) SPANSEQGEN produces correct answers and questions. (#3) the model produces the incorrect answer “February 9, 2018”, which is the release date of Fifty Shades Freed
  • Table7: Exact Match (EM) on NQ-OPEN of different models, counting a prediction as correct if it matches Any gold reference, or only the First non-null one
  • Table8: Breakdown of cases that NQ-OPEN answer is not included in AMBIGNQ answers
Download tables as Excel
Related work
  • Open-domain Question Answering requires a system to answer any factoid question based on evidence provided by a large text collection such as Wikipedia (Voorhees et al, 1999; Chen et al, 2017). Existing benchmarks include various kinds of questions, from open-ended information-seeking (Berant et al, 2013; Kwiatkowski et al, 2019; Clark et al, 2019) to more specialized trivia/quiz (Joshi et al, 2017; Dunn et al, 2017). To the best of our knowledge, all existing formulations of opendomain QA assume each question has a single clear answer.

    Our work is built upon an open-domain version of NATURAL QUESTIONS (Kwiatkowski et al., 2019), denoted NQ-OPEN, composed of questions posed by real users of Google search, each with an answer drawn from Wikipedia. NQ-OPEN has promoted several recent advances in opendomain question answering (Lee et al, 2019; Asai et al, 2020; Min et al, 2019a,b; Guu et al, 2020; Karpukhin et al, 2020). Nonetheless, Kwiatkowski et al (2019) report that the answers to such questions are often debatable, and the average agreement rate on NQ-OPEN test data is 49.2%,1 in large part due to ambiguous questions. In this paper, we embrace this ambiguity as inherent to information seeking open QA, and present the first methods for returning sets of answers paired with different interpretations of the question.
Study subjects and analysis
random samples: 100
. Breakdown of the types of ambiguity based on 100 random samples on AMBIGNQ development data. Data statistics. For the number of QA pairs (# QAs), the minimum is taken when there are more than 1 accepted annotations

  • Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, and W Bruce Croft. 2019. Asking clarifying questions in open-domain information-seeking conversations. In SIGIR.
    Google ScholarFindings
  • Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, and Caiming Xiong. 2020. Learning to retrieve reasoning paths over wikipedia graph for question answering. In ICLR.
    Google ScholarFindings
  • Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 201Semantic parsing on Freebase from question-answer pairs. In EMNLP.
    Google ScholarFindings
  • Pavel Braslavski, Denis Savenkov, Eugene Agichtein, and Alina Dubatovka. 2017. What do you mean exactly? Analyzing clarification questions in CQA. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval.
    Google ScholarLocate open access versionFindings
  • Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to answer opendomain questions. In ACL.
    Google ScholarFindings
  • Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. 2019. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In NAACL.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
    Google ScholarFindings
  • Matthew Dunn, Levent Sagun, Mike Higgins, Ugur Guney, Volkan Cirik, and Kyunghyun Cho. 2017. SearchQA: A new q&a dataset augmented with context from a search engine. arXiv preprint arXiv:1704.05179.
  • Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. REALM: Retrieval-augmented language model pre-training. arXiv preprint arXiv:2002.08909.
  • Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettlemoyer. 2017. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In ACL.
    Google ScholarFindings
  • Vladimir Karpukhin, Barlas Oguz, Sewon Min, Ledell Wu, Sergey Edunov, Danqi Chen, and Wentau Yih. 2020. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906.
  • Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Change, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: a benchmark for question answering research. TACL.
    Google ScholarLocate open access versionFindings
  • Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent retrieval for weakly supervised open domain question answering. In ACL.
    Google ScholarFindings
  • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence pretraining for natural language generation, translation, and comprehension. In ACL.
    Google ScholarFindings
  • Julian Michael, Gabriel Stanovsky, Luheng He, Ido Dagan, and Luke Zettlemoyer. 2018. Crowdsourcing question-answer meaning representations. In NAACL.
    Google ScholarFindings
  • Sewon Min, Danqi Chen, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2019a. A discrete hard EM approach for weakly supervised question answering. In EMNLP.
    Google ScholarFindings
  • Sewon Min, Danqi Chen, Luke Zettlemoyer, and Hannaneh Hajishirzi. 2019b. Knowledge guided text retrieval and reading for open domain question answering. arXiv preprint arXiv:1911.03868.
  • Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038.
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In ACL.
    Google ScholarFindings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch.
    Google ScholarFindings
  • Sudha Rao and Hal Daume III. 2018. Learning to ask good questions: Ranking clarification questions using neural expected value of perfect information. In ACL.
    Google ScholarLocate open access versionFindings
  • Sudha Rao and Hal Daume III. 2019. Answer-based adversarial training for generating clarification questions. In NAACL.
    Google ScholarFindings
  • Pavel Sountsov and Sunita Sarawagi. 2016. Length bias in encoder decoder models and a case for global conditioning. In EMNLP.
    Google ScholarFindings
  • Felix Stahlberg and Bill Byrne. 2019. On nmt search errors and model errors: Cat got your tongue? In EMNLP.
    Google ScholarFindings
  • Ellen M Voorhees et al. 1999. The TREC-8 question answering track report. In Trec.
    Google ScholarFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. 2019. HuggingFace’s Transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
  • Jingjing Xu, Yuechen Wang, Duyu Tang, Nan Duan, Pengcheng Yang, Qi Zeng, Ming Zhou, and SUN Xu. 2019. Asking clarification questions in knowledge-based question answering. In EMNLP.
    Google ScholarFindings
  • Y. Zhou and S. Goldman. 2004. Democratic colearning. In IEEE International Conference on Tools with Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • We use Amazon Mechanical Turk10 and Spacro (Michael et al., 2018)11 for crowdsourcing. All data was collected in February and March of 2020. We use the Google Search API12 restricted to English Wikipedia for the search tool.
    Google ScholarFindings
  • Min et al. (2019b) Asai et al. (2020) Karpukhin et al. (2020) SPANSEQGEN
    Google ScholarLocate open access versionFindings
Your rating :