AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a new approach, Hierarchical Graph Network, for multi-hop question answering

Hierarchical Graph Network for Multi hop Question Answering

EMNLP 2020, pp.8823-8838, (2020)

Cited by: 15|Views646
Full Text
Bibtex
Weibo

Abstract

In this paper, we present Hierarchical Graph Network (HGN) for multi-hop question answering. To aggregate clues from scattered texts across multiple paragraphs, a hierarchical graph is created by constructing nodes on different levels of granularity (questions, paragraphs, sentences, entities), the representations of which are initialized...More

Code:

Data:

0
Introduction
Highlights
  • In contrast to one-hop question answering (Rajpurkar et al, 2016; Trischler et al, 2016; Lai et al, 2017), where answers can be derived from a single paragraph (Wang and Jiang, 2017; Seo et al, 2017; Liu et al, 2018; Devlin et al, 2019), recent studies have more and more focused on multihop reasoning across multiple documents or paragraphs for question answering
  • We propose a Hierarchical Graph Network (HGN) for multi-hop question answering, which provides multi-level fine-grained graphs with a hierarchical structure for joint answer and evidence prediction
  • The main contributions of this paper are threefold. (i) We propose a Hierarchical Graph Network (HGN) for multi-hop question answering, where heterogeneous nodes are woven into an integral unified graph. (ii) Nodes from different granularity levels are utilized for different sub-tasks, providing effective supervision signals for both supporting facts extraction and final answer prediction. (iii) HGN achieves new state of the art in both Distractor and Fullwiki settings on HotpotQA benchmark, outperforming previous work by a significant margin
  • We propose a new approach, Hierarchical Graph Network (HGN), for multi-hop question answering
  • In the Fullwiki setting, an off-theshelf paragraph retriever is adopted for selecting relevant context from large corpus of text
Methods
  • Threshold-based Top 2 from ranker Top 4 from ranker 2 paragraphs 4 paragraphs #Para. 3.26

    Effectiveness of Paragraph Selection The proposed HGN relies on effective paragraph selection to find relevant multi-hop paragraphs.
  • In DFGN, paragraphs are selected based on a threshold to maintain high recall (98.27%), leading to a low precision (60.28%).
  • Compared to both threshold-based and pure TopN -based paragraph selection, the two-step paragraph selection process is more accurate, achieving 94.53% precision and 94.53% recall.
Results
  • Results on the Leaderboard Table 1 and Table 2 summarize the results on the hidden test set of HotpotQA in the Distractor and Fullwiki setting, respectively.
  • HGN achieves a Joint EM/F1 score of 43.57/71.03 and 35.63/59.86 on the Distractor and Fullwiki setting, respectively, with an absolute improvement of 2.36/0.38 and 6.45/4.55 points over the previous state of the art.
  • The authors will conduct detailed analysis on the dev set to analyze the source of the performance gain
Conclusion
  • The authors propose a new approach, Hierarchical Graph Network (HGN), for multi-hop question answering.
  • To capture clues from different granularity levels, the HGN model weaves heterogeneous nodes into a single unified graph.
  • Experiments with detailed analysis demonstrate the effectiveness of the proposed model, which achieves state-of-the-art performance on HotpotQA benchmark.
  • In the Fullwiki setting, an off-theshelf paragraph retriever is adopted for selecting relevant context from large corpus of text.
  • Future work includes investigating the interaction and joint training between HGN and paragraph retriever for performance improvement
Summary
  • Introduction:

    In contrast to one-hop question answering (Rajpurkar et al, 2016; Trischler et al, 2016; Lai et al, 2017), where answers can be derived from a single paragraph (Wang and Jiang, 2017; Seo et al, 2017; Liu et al, 2018; Devlin et al, 2019), recent studies have more and more focused on multihop reasoning across multiple documents or paragraphs for question answering.
  • Popular tasks include WikiHop (Welbl et al, 2018), ComplexWebQuestions (Talmor and Berant, 2018), and HotpotQA (Yang et al, 2018).
  • In order to correctly answer the question (“The director of the romantic comedy ‘Big Stone.
  • Gap’ is based in what New York city”), the model first needs to identify P1 as a relevant paragraph, whose title contains keywords that appear in the question (“Big Stone Gap”).
  • From P2, the span “Greenwich Village, New York City” is selected as the predicted answer
  • Methods:

    Threshold-based Top 2 from ranker Top 4 from ranker 2 paragraphs 4 paragraphs #Para. 3.26

    Effectiveness of Paragraph Selection The proposed HGN relies on effective paragraph selection to find relevant multi-hop paragraphs.
  • In DFGN, paragraphs are selected based on a threshold to maintain high recall (98.27%), leading to a low precision (60.28%).
  • Compared to both threshold-based and pure TopN -based paragraph selection, the two-step paragraph selection process is more accurate, achieving 94.53% precision and 94.53% recall.
  • Results:

    Results on the Leaderboard Table 1 and Table 2 summarize the results on the hidden test set of HotpotQA in the Distractor and Fullwiki setting, respectively.
  • HGN achieves a Joint EM/F1 score of 43.57/71.03 and 35.63/59.86 on the Distractor and Fullwiki setting, respectively, with an absolute improvement of 2.36/0.38 and 6.45/4.55 points over the previous state of the art.
  • The authors will conduct detailed analysis on the dev set to analyze the source of the performance gain
  • Conclusion:

    The authors propose a new approach, Hierarchical Graph Network (HGN), for multi-hop question answering.
  • To capture clues from different granularity levels, the HGN model weaves heterogeneous nodes into a single unified graph.
  • Experiments with detailed analysis demonstrate the effectiveness of the proposed model, which achieves state-of-the-art performance on HotpotQA benchmark.
  • In the Fullwiki setting, an off-theshelf paragraph retriever is adopted for selecting relevant context from large corpus of text.
  • Future work includes investigating the interaction and joint training between HGN and paragraph retriever for performance improvement
Tables
  • Table1: Results on the test set of HotpotQA in the Distractor setting. HGN achieves state-of-the-art results at the time of submission (Sep. 27, 2019). (†) indicates unpublished work. BERT-wwm is used for context encoding. Leaderboard: https://hotpotqa.github.io/
  • Table2: Results on the test set of HotpotQA in the Fullwiki setting. HGN, when combined with the SemanticRetrievalMRS retrieval system, achieves state-of-the-art results at the time of submission (Oct. 7, 2019). (†) indicates unpublished work. RoBERTa-large is used for context encoding. Leaderboard: https://hotpotqa.github.io/
  • Table3: Performance of paragraph selection on the dev set of HotpotQA based on BERT-base
  • Table4: Results with selected paragraphs on the dev set in the Distractor setting
  • Table5: Ablation study on the effectiveness of the hierarchical graph on the dev set in the Distractor setting. RoBERTa-large is used for context encoding
  • Table6: Ablation study on the proposed multi-task loss. RoBERTa-large is used for context encoding
  • Table7: Results with different pre-trained language models on the dev set in the Distractor setting. (†) is unpublished work with results on the test set, using BERT whole word masking (wwm)
Download tables as Excel
Related work
  • Multi-Hop QA Multi-hop question answering requires a model to aggregate scattered pieces of evidence across multiple documents to predict the right answer. WikiHop (Welbl et al, 2018) and HotpotQA (Yang et al, 2018) are two recent datasets designed for this purpose. Specifically, WikiHop is constructed using the schema of the underlying knowledge bases, thus limiting answers to entities only. HotpotQA, on the other hand, is freeform text collected by Amazon Mechanical Turkers, which results in significantly more diverse questions and answers. HotpotQA also focuses more on explainability, by requiring supporting facts as the reasoning chain for deriving the correct answer. Two settings are provided in HotpotQA: the distractor setting requires techniques for multihop reading comprehension, while the fullwiki setting is more focused on information retrieval.
Reference
  • Anonymous. 2020a. Latent question reformulation and information accumulation for multi-hop machine reading. In Submitted to ICLR.
    Google ScholarFindings
  • Anonymous. 2020b. Learning to retrieve reasoning paths over wikipedia graph for question answering. In Submitted to ICLR.
    Google ScholarFindings
  • Anonymous. 2020c. Transformer-{xh}: Multi-hop question answering with extra hop attention. In Submitted to ICLR.
    Google ScholarFindings
  • Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to answer opendomain questions. In ACL.
    Google ScholarFindings
  • Jifan Chen and Greg Durrett. 2019. Understanding dataset design choices for multi-hop reasoning. In NAACL.
    Google ScholarFindings
  • Jifan Chen, Shih-ting Lin, and Greg Durrett. 2019. Multi-hop question answering via reasoning chains. arXiv preprint arXiv:1910.02610.
    Findings
  • Nicola De Cao, Wilker Aziz, and Ivan Titov. 2019. Question answering by reasoning across documents with graph convolutional networks. In NAACL.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
    Google ScholarFindings
  • Bhuwan Dhingra, Qiao Jin, Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. 2018. Neural models for reasoning over multiple mentions using coreference. In NAACL.
    Google ScholarFindings
  • Ming Ding, Chang Zhou, Qibin Chen, Hongxia Yang, and Jie Tang. 2019. Cognitive graph for multi-hop reading comprehension at scale. In ACL.
    Google ScholarFindings
  • Yair Feldman and Ran El-Yaniv. 2019. Multi-hop paragraph retrieval for open-domain question answering. arXiv preprint arXiv:1906.06606.
    Findings
  • Ameya Godbole, Dilip Kavarthapu, Rajarshi Das, Zhiyu Gong, Abhishek Singhal, Hamed Zamani, Mo Yu, Tian Gao, Xiaoxiao Guo, Manzil Zaheer, et al. 2019. Multi-step entity-centric information retrieval for multi-hop question answering. arXiv preprint arXiv:1909.07598.
    Findings
  • Yichen Jiang and Mohit Bansal. 2019a. Avoiding reasoning shortcuts: Adversarial evaluation, training, and model development for multi-hop qa. In ACL.
    Google ScholarFindings
  • Yichen Jiang and Mohit Bansal. 2019b. Selfassembling modular networks for interpretable multi-hop reasoning. In EMNLP.
    Google ScholarFindings
  • Thomas N Kipf and Max Welling. 2017. Semisupervised classification with graph convolutional networks. In ICLR.
    Google ScholarFindings
  • Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. 2017. Race: Large-scale reading comprehension dataset from examinations. In EMNLP.
    Google ScholarLocate open access versionFindings
  • Xiaodong Liu, Yelong Shen, Kevin Duh, and Jianfeng Gao. 2018. Stochastic answer networks for machine reading comprehension. In ACL.
    Google ScholarFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Sewon Min, Eric Wallace, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2019a. Compositional questions do not necessitate multi-hop reasoning. In ACL.
    Google ScholarFindings
  • Sewon Min, Victor Zhong, Luke Zettlemoyer, and Hannaneh Hajishirzi. 2019b. Multi-hop reading comprehension through question decomposition and rescoring. In ACL.
    Google ScholarFindings
  • Kosuke Nishida, Kyosuke Nishida, Masaaki Nagata, Atsushi Otsuka, Itsumi Saito, Hisako Asano, and Junji Tomita. 2019. Answering while summarizing: Multi-task learning for multi-hop qa with evidence extraction. In ACL.
    Google ScholarFindings
  • Peng Qi, Xiaowen Lin, Leo Mehr, Zijian Wang, and Christopher D. Manning. 2019. Answering complex open-domain questions through iterative query generation. In EMNLP.
    Google ScholarFindings
  • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. In EMNLP.
    Google ScholarFindings
  • Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Bidirectional attention flow for machine comprehension. In ICLR.
    Google ScholarFindings
  • Linfeng Song, Zhiguo Wang, Mo Yu, Yue Zhang, Radu Florian, and Daniel Gildea. 2018. Exploring graph-structured passage representation for multihop reading comprehension with graph neural networks. arXiv preprint arXiv:1809.02040.
    Findings
  • Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering complex questions. In NAACL.
    Google ScholarFindings
  • Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. 2016. Newsqa: A machine comprehension dataset. arXiv preprint arXiv:1611.09830.
    Findings
  • Ming Tu, Guangtao Wang, Jing Huang, Yun Tang, Xiaodong He, and Bowen Zhou. 2019. Multi-hop reading comprehension across multiple documents by reasoning over heterogeneous graphs. In ACL.
    Google ScholarFindings
  • Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In ICLR.
    Google ScholarFindings
  • Shuohang Wang and Jing Jiang. 2017. Machine comprehension using match-lstm and answer pointer. In ICLR.
    Google ScholarFindings
  • Johannes Welbl, Pontus Stenetorp, and Sebastian Riedel. 2018. Constructing datasets for multi-hop reading comprehension across documents. TACL.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rmi Louf, Morgan Funtowicz, and Jamie Brew. 2019. Transformers: State-ofthe-art natural language processing. arXiv preprint arXiv:1910.03771.
    Findings
  • Yunxuan Xiao, Yanru Qu, Lin Qiu, Hao Zhou, Lei Li, Weinan Zhang, and Yong Yu. 2019. Dynamically fused graph network for multi-hop reasoning. In ACL.
    Google ScholarFindings
  • Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Hong Wang, Shiyu Chang, Murray Campbell, and William Yang Wang. 2019. Simple yet effective bridge reasoning for open-domain multi-hop question answering. arXiv preprint arXiv:1909.07597.
    Findings
  • Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W Cohen, Ruslan Salakhutdinov, and Christopher D Manning. 2018. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In EMNLP.
    Google ScholarFindings
  • Mohit Bansal Yixin Nie, Songhe Wang. 2019. Revealing the importance of semantic retrieval for machine reading at scale. In EMNLP.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
小科