Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents

national conference on artificial intelligence, 2020.

Cited by: 6|Bibtex|Views112
Other Links: academic.microsoft.com|arxiv.org
Weibo:
We propose a new effective and interpretable system to tackle the multi-hop Reading Comprehension problem over multiple documents

Abstract:

Interpretable multi-hop reading comprehension (RC) over multiple documents is a challenging problem because it demands reasoning over multiple information sources and explaining the answer prediction by providing supporting evidences. In this paper, we propose an effective and interpretable Select, Answer and Explain (SAE) system to sol...More

Code:

Data:

0
Introduction
  • Machine Reading Comprehension (RC) or Question answering (QA) has seen great advancement in recent years.
  • Most existing research in machine RC/QA focuses on answering a question given a single document or paragraph.
  • The performance on these types of tasks have been improved a lot over the last few years, the models used in these tasks still lack the ability to do reasoning across multiple documents when a single document is not enough to find the correct answer (Chen and Durrett 2019).
  • In order to improve a machine’s ability to do multi-hop reasoning over
Highlights
  • Machine Reading Comprehension (RC) or Question answering (QA) has seen great advancement in recent years
  • On the blind test set of HotpotQA, our SAE system attains top competitive results compared to other systems on the leaderboard 1 at the time of submission (Sep 5th)
  • Our method improves more than 28% and 25% absolutely in terms of joint Exact Match and F1 scores over the baseline model
  • We find that for almost all measurements, the gap between our model using predicted gold documents and oracle gold documents is around 3-4%, which implies the effectiveness of our document selection module
  • We propose a new effective and interpretable system to tackle the multi-hop Reading Comprehension problem over multiple documents
  • Our proposed system attains competitive results on the HotpotQA blind test set compared to existing systems
Methods
  • The diagram of the proposed SAE system is shown in Figure 2.
  • The authors assume a setting where each example in the data set contains a question and a set of N documents; a set of labelled support sentences from different documents; the answer text, which could be a span of text or “Yes/No”.
  • The authors derive the gold document labels from the answer and support sentence labels.
  • The authors use Di to note document i: it is labelled as 1 if Di is a gold doc, otherwise 0.
  • We label the answer type as one of the following annotations: “Span”, “Yes” and “No”.
Results
  • The authors' method improves more than 28% and 25% absolutely in terms of joint EM and F1 scores over the baseline model.
  • Compared to the DFGN model (Xiao et al 2019) and QFE model (Nishida et al 2019), the SAE system is over 5% absolutely better in terms of both joint EM and F1 scores.
  • For SAE-oracle, the authors directly input the annotated gold documents of dev set to get answer and support sentence prediction.
Conclusion
  • The authors propose a new effective and interpretable system to tackle the multi-hop RC problem over multiple documents.
  • The authors' system first accurately filters out unrelated documents and performs joint prediction of answer and supporting evidence.
  • Several novel ideas to train the document filter model and the model for answer and support sentence prediction are presented.
  • The authors' proposed system attains competitive results on the HotpotQA blind test set compared to existing systems.
  • The authors would like to thank Peng Qi of Stanford University for running evaluation on the submitted models.
Summary
  • Introduction:

    Machine Reading Comprehension (RC) or Question answering (QA) has seen great advancement in recent years.
  • Most existing research in machine RC/QA focuses on answering a question given a single document or paragraph.
  • The performance on these types of tasks have been improved a lot over the last few years, the models used in these tasks still lack the ability to do reasoning across multiple documents when a single document is not enough to find the correct answer (Chen and Durrett 2019).
  • In order to improve a machine’s ability to do multi-hop reasoning over
  • Methods:

    The diagram of the proposed SAE system is shown in Figure 2.
  • The authors assume a setting where each example in the data set contains a question and a set of N documents; a set of labelled support sentences from different documents; the answer text, which could be a span of text or “Yes/No”.
  • The authors derive the gold document labels from the answer and support sentence labels.
  • The authors use Di to note document i: it is labelled as 1 if Di is a gold doc, otherwise 0.
  • We label the answer type as one of the following annotations: “Span”, “Yes” and “No”.
  • Results:

    The authors' method improves more than 28% and 25% absolutely in terms of joint EM and F1 scores over the baseline model.
  • Compared to the DFGN model (Xiao et al 2019) and QFE model (Nishida et al 2019), the SAE system is over 5% absolutely better in terms of both joint EM and F1 scores.
  • For SAE-oracle, the authors directly input the annotated gold documents of dev set to get answer and support sentence prediction.
  • Conclusion:

    The authors propose a new effective and interpretable system to tackle the multi-hop RC problem over multiple documents.
  • The authors' system first accurately filters out unrelated documents and performs joint prediction of answer and supporting evidence.
  • Several novel ideas to train the document filter model and the model for answer and support sentence prediction are presented.
  • The authors' proposed system attains competitive results on the HotpotQA blind test set compared to existing systems.
  • The authors would like to thank Peng Qi of Stanford University for running evaluation on the submitted models.
Tables
  • Table1: Results comparison between our proposed SAE system with other methods. ∗ indicates unpublished models
  • Table2: Ablation study results on HotpotQA dev set
  • Table3: Table 3
  • Table4: Performance comparison in terms of joint EM and F1 scores under different reasoning types
Download tables as Excel
Related work
  • Multi-hop multi-document QA: The work in (Dhingra et al 2018) designed a recurrent layer to explicitly exploit the skip connections between entities from different documents given coreference predictions. An attention based system was proposed in (Zhong et al 2019) and it shows that techniques like co-attention and self-attention widely employed in single-document RC tasks are also useful in multi-document RC tasks. The study by (Song et al 2018) adopted two separate Named Entity Recognition (NER) and coreference resolution systems to locate entities in support documents, which are then used in GNN to enable multi-hop reasoning across documents. Work by (De Cao, Aziz, and Titov 2019; Cao, Fang, and Tao 2019) directly used mentions of candidates as GNN nodes and calculated classification scores over mentions of candidates. The study in (Tu et al 2019) proposed a heterogeneous graph including document, candidate and entity nodes to enable rich information interaction at different granularity levels.

    Our proposed system is different from the previous models in that 1) Our model is jointly trained and is capable of explaining the answer prediction by providing supporting sentences. 2) We propose to first filter out answer unrelated documents and then perform answer prediction. Explainable QA: The study in (Zhou, Huang, and Zhu 2018) proposed an Interpretable Reasoning Network for QA on knowledge base. The baseline model provided in the HotpotQA paper (Yang et al 2018) and the QFE model
Funding
  • This work is partially supported by Beijing Academy of Artificial Intelligence (BAAI)
Reference
  • [Cao, Fang, and Tao 2019] Cao, Y.; Fang, M.; and Tao, D. 2019. Bag: Bi-directional attention entity graph convolutional network for multi-hop reasoning question answering. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 357–362.
    Google ScholarLocate open access versionFindings
  • [Chen and Durrett 2019] Chen, J., and Durrett, G. 2019. Understanding dataset design choices for multi-hop reasoning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4026–4032.
    Google ScholarLocate open access versionFindings
  • [Clark and Gardner 2018] Clark, C., and Gardner, M. 2018. Simple and effective multi-paragraph reading comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 845–855.
    Google ScholarLocate open access versionFindings
  • [De Cao, Aziz, and Titov 2019] De Cao, N.; Aziz, W.; and Titov, I. 2019. Question answering by reasoning across documents with graph convolutional networks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2306–2317.
    Google ScholarLocate open access versionFindings
  • [Devlin et al. 2019] Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186.
    Google ScholarLocate open access versionFindings
  • [Dhingra et al. 2018] Dhingra, B.; Jin, Q.; Yang, Z.; Cohen, W.; and Salakhutdinov, R. 2018. Neural models for reasoning over multiple mentions using coreference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), volume 2, 42–48.
    Google ScholarLocate open access versionFindings
  • [Kociskyet al. 2018] Kocisky, T.; Schwarz, J.; Blunsom, P.; Dyer, C.; Hermann, K. M.; Melis, G.; and Grefenstette, E. 2018. The narrativeqa reading comprehension challenge. Transactions of the Association of Computational Linguistics 6:317–328.
    Google ScholarLocate open access versionFindings
  • [Liu and others 2009] Liu, T.-Y., et al. 2009. Learning to rank for information retrieval. Foundations and Trends R in Information Retrieval 3(3):225–331.
    Google ScholarLocate open access versionFindings
  • [Liu et al. 2019] Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; and Stoyanov, V. 201Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • [Min et al. 2018] Min, S.; Zhong, V.; Socher, R.; and Xiong, C. 2018. Efficient and robust question answering from minimal context over documents. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1725–1735.
    Google ScholarLocate open access versionFindings
  • [Nishida et al. 2019] Nishida, K.; Nishida, K.; Nagata, M.; Otsuka, A.; Saito, I.; Asano, H.; and Tomita, J. 2019. Answering while summarizing: Multi-task learning for multi-hop qa with evidence extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2335–2345.
    Google ScholarLocate open access versionFindings
  • [Paszke et al. 2017] Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; and Lerer, A. 2017. Automatic differentiation in pytorch.
    Google ScholarFindings
  • [Rajpurkar et al. 2016] Rajpurkar, P.; Zhang, J.; Lopyrev, K.; and Liang, P. 2016. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2383–2392.
    Google ScholarLocate open access versionFindings
  • [Rajpurkar, Jia, and Liang 2018] Rajpurkar, P.; Jia, R.; and Liang, P. 2018. Know what you dont know: Unanswerable questions for squad. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, 784–789.
    Google ScholarLocate open access versionFindings
  • [Reddy, Chen, and Manning 2019] Reddy, S.; Chen, D.; and Manning, C. D. 2019. Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics 7:249–266.
    Google ScholarLocate open access versionFindings
  • [Rei and Søgaard 2019] Rei, M., and Søgaard, A. 2019. Jointly learning to label sentences and tokens. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 6916–6923.
    Google ScholarLocate open access versionFindings
  • [Seo et al. 2016] Seo, M.; Kembhavi, A.; Farhadi, A.; and Hajishirzi, H. 2016. Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603.
    Findings
  • [Song et al. 2018] Song, L.; Wang, Z.; Yu, M.; Zhang, Y.; Florian, R.; and Gildea, D. 20Exploring graph-structured passage representation for multi-hop reading comprehension with graph neural networks. arXiv preprint arXiv:1809.02040.
    Findings
  • [Tay et al. 2018] Tay, Y.; Luu, A. T.; Hui, S. C.; and Su, J. 2018. Densely connected attention propagation for reading comprehension. In Advances in Neural Information Processing Systems, 4911–4922.
    Google ScholarLocate open access versionFindings
  • [Tu et al. 2019] Tu, M.; Wang, G.; Huang, J.; Tang, Y.; He, X.; and Zhou, B. 2019. Multi-hop reading comprehension across multiple documents by reasoning over heterogeneous graphs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2704–2713.
    Google ScholarLocate open access versionFindings
  • [Vaswani et al. 2017] Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. In Advances in neural information processing systems, 5998–6008.
    Google ScholarLocate open access versionFindings
  • [Wang et al. 2019] Wang, H.; Yu, D.; Sun, K.; Chen, J.; Yu, D.; Roth, D.; and McAllester, D. 2019. Evidence sentence extraction for machine reading comprehension. arXiv preprint arXiv:1902.08852.
    Findings
  • [Welbl, Stenetorp, and Riedel 2018] Welbl, J.; Stenetorp, P.; and Riedel, S. 2018. Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association of Computational Linguistics 6:287–302.
    Google ScholarLocate open access versionFindings
  • [Xiao et al. 2019] Xiao, Y.; Qu, Y.; Qiu, L.; Zhou, H.; Li, L.; Zhang, W.; and Yu, Y. 2019. Dynamically fused graph network for multihop reasoning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 6140–6150.
    Google ScholarLocate open access versionFindings
  • [Xiong, Zhong, and Socher 2016] Xiong, C.; Zhong, V.; and Socher, R. 2016. Dynamic coattention networks for question answering. arXiv preprint arXiv:1611.01604.
    Findings
  • [Yang et al. 2018] Yang, Z.; Qi, P.; Zhang, S.; Bengio, Y.; Cohen, W.; Salakhutdinov, R.; and Manning, C. D. 2018. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2369–2380.
    Google ScholarLocate open access versionFindings
  • [Zhong et al. 2019] Zhong, V.; Xiong, C.; Keskar, N. S.; and Socher, R. 2019. Coarse-grain fine-grain coattention network for multi-evidence question answering. arXiv preprint arXiv:1901.00603.
    Findings
  • [Zhou, Huang, and Zhu 2018] Zhou, M.; Huang, M.; and Zhu, X. 2018. An interpretable reasoning network for multi-relation question answering. In Proceedings of the 27th International Conference on Computational Linguistics, 2010–2022.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments