Question Answering by Reasoning Across Documents with Graph Convolutional Networks.

north american chapter of the association for computational linguistics, (2019)

被引用132|浏览105
EI
下载 PDF 全文
引用
微博一下

摘要

Most research in reading comprehension has focused on answering questions based on individual documents or even single paragraphs. We introduce a method which integrates and reasons relying on information spread within documents and across multiple documents. We frame it as an inference problem on a graph. Mentions of entities are nodes o...更多

代码

数据

简介
  • The long-standing goal of natural language understanding is the development of systems which can acquire knowledge from text collections.
  • Fresh interest in reading comprehension tasks was sparked by the availability of large-scale datasets, such as SQuAD (Rajpurkar et al, 2016) and CNN/Daily Mail (Hermann et al, 2015), enabling end-to-end training of neural models (Seo et al, 2016; Xiong et al, 2016; Shen et al, 2017)
  • These systems, given a text and a question, need to answer the query relying on the given document.
  • Though there is no guarantee that a question cannot be answered by relying just on a single sentence, the authors ensure that it is answerable using a chain of reasoning crossing document boundaries
重点内容
  • The long-standing goal of natural language understanding is the development of systems which can acquire knowledge from text collections
  • Though there is no guarantee that a question cannot be answered by relying just on a single sentence, the authors ensure that it is answerable using a chain of reasoning crossing document boundaries
  • Despite not using recurrent document encoders, the full Entity-Graph convolutional networks (GCNs) model achieves over 2% improvement over the best previously-published results
  • We present the building blocks that make up our Entity-GCN model, namely, an entity graph used to relate mentions to entities within and across documents, a document encoder used to obtain representations of mentions in context, and a relational graph convolutional network that propagates information through the entity graph
  • The knowledge base (KB) is only used for constructing WIKIHOP: Welbl et al (2018) retrieved the supporting documents Sq from the corpus looking at mentions of subject and object entities in the text
  • Entity-GCN outperforms all previous work by over 2% points
方法
  • The authors explain the method. The authors first introduce the dataset the authors focus on, WIKIHOP by Welbl et al (2018), as well as the task abstraction.
  • The KB is only used for constructing WIKIHOP: Welbl et al (2018) retrieved the supporting documents Sq from the corpus looking at mentions of subject and object entities in the text.
结果
  • Despite not using recurrent document encoders, the full Entity-GCN model achieves over 2% improvement over the best previously-published results.
  • As the model is efficient, the authors reported results of an ensemble which brings further 3.6% of improvement and only 3% below the human performance reported by Welbl et al (2018).
  • The results suggest that WIKIPHOP genuinely requires multihop inference, as the best model is 6.1% and 8.4% more accurate than this local model, in unmasked and masked settings, respectively.
结论
  • The authors designed a graph neural network that operates over a compact graph representation of a set of documents where nodes are mentions to entities and edges signal relations such as within and cross-document coreference.
  • The model learns to answer questions by gathering evidence from different documents via a differentiable message passing algorithm that updates node representations based on their neighbourhood.
  • The authors' model outperforms published results where ablations show substantial evidence in favour of multistep reasoning.
  • The authors make the model fast by using pre-trained embeddings
表格
  • Table1: WIKIHOP dataset statistics from <a class="ref-link" id="cWelbl_et+al_2018_a" href="#rWelbl_et+al_2018_a">Welbl et al (2018</a>): number of candidates and documents per sample and document length
  • Table2: Accuracy of different models on WIKIHOP closed test set and public validation set. Our Entity-GCN outperforms recent prior work without learning any language model to process the input but relying on a pretrained one (ELMo – without fine-tunning it) and applying R-GCN to reason among entities in the text. * with coreference for unmasked dataset and without coreference for the masked one
  • Table3: Ablation study on WIKIHOP validation set. The full model is our Entity-GCN with all of its components and other rows indicate models trained without a component of interest. We also report baselines using GloVe instead of ELMo with and without R-GCN. For the full model we report mean ±1 std over 5 runs
  • Table4: Accuracy and precision at K (P@K in the table) analysis overall and per query type. Avg. |Cq| indicates the average number of candidates with one standard deviation
  • Table5: Model architecture
  • Table6: Samples from WIKIHOP set where Entity-GCN fails. p indicates the predicted likelihood
Download tables as Excel
相关工作
  • In previous work, BiDAF (Seo et al, 2016), FastQA (Weissenborn et al, 2017), CorefGRU (Dhingra et al, 2018), MHPGM (Bauer et al, 2018), and Weaver / Jenga (Raison et al, 2018) have been applied to multi-document question answering. The first two mainly focus on single document QA and Welbl et al (2018) adapted both of them to work with WIKIHOP. They process each instance of the dataset by concatenating all d ∈ Sq in a random order adding document separator tokens. They trained using the first answer mention in the concatenated document and evaluating exact match at test time. CorefGRU, similarly to us, encodes relations between entity mentions in the document. Instead of using graph neural network layers, as we do, they augment RNNs with jump links corresponding to pairs of corefereed mentions. MHPGM uses a multi-attention mechanism in combination with external commonsense relations to perform multiple hops of reasoning. Weaver is a deep coencoding model that uses several alternating biLSTMs to process the concatenated documents and the query.
基金
  • This project is supported by SAP Innovation Center Network, ERC Starting Grant BroadSem (678254) and the Dutch Organization for Scientific Research (NWO) VIDI 639.022.518
  • Wilker Aziz is supported by the Dutch Organisation for Scientific Research (NWO) VICI Grant nr. 277-89-002. Processing, pages 4220–4230
研究对象与分析
query-documents samples: 2451
The test set is not publicly available and therefore we measure performance on the validation set in almost all experiments. WIKIHOP has 43,738/ 5,129/ 2,451 query-documents samples in the training, validation and test sets respectively for a total of 51,318 samples. Authors constructed the dataset as described in Section 2.1 selecting samples with a graph traversal up to a maximum chain length of 3 documents (see Table 1 for additional dataset statistics)

supporting documents: 50
First of all, we look at which type of questions our model performs well or poorly. There are more than 150 query types in the validation set but we filtered the three with the best and with the worst accuracy that have at least 50 supporting documents and at least 5 candidates. We show results in Table 4

引用论文
  • Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Simaan. 2017. Graph convolutional encoders for syntax-aware neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1957–1967. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lisa Bauer, Yicheng Wang, and Mohit Bansal. 2018. Commonsense for generative multi-hop question answering tasks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language
    Google ScholarLocate open access versionFindings
  • Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. 2017. RACE: Large-scale reading comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 785–794, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. 2017. End-to-end neural coreference resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 188–197. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Diego Marcheggiani and Ivan Titov. 2017. Encoding sentences with graph convolutional networks for semantic role labeling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1506–151Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.
    Google ScholarLocate open access versionFindings
  • Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 201Cross-sentence n-ary relation extraction with graph lstms. Transactions of the Association for Computational Linguistics, 5:101–115.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
    Google ScholarLocate open access versionFindings
  • Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding with unsupervised learning. Technical report, OpenAI.
    Google ScholarFindings
  • Martin Raison, Pierre-Emmanuel Mazare, Rajarshi Das, and Antoine Bordes. 2018. Weaver: Deep coencoding of questions and documents for machine reading. In Proceedings of the International Conference on Machine Learning (ICML).
    Google ScholarLocate open access versionFindings
  • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In The Semantic Web, pages 593– 607, Cham. Springer International Publishing.
    Google ScholarLocate open access versionFindings
  • Yelong Shen, Po-Sen Huang, Jianfeng Gao, and Weizhu Chen. 2017. Reasonet: Learning to stop reading in machine comprehension. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1047–1055. ACM.
    Google ScholarLocate open access versionFindings
  • Linfeng Song, Zhiguo Wang, Mo Yu, Yue Zhang, Radu Florian, and Daniel Gildea. 2018. Exploring Graph-structured Passage Representation for Multihop Reading Comprehension with Graph Neural Networks. arXiv preprint arXiv:1809.02040.
    Findings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 20Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Denny Vrandecic. 2012. Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st International Conference on World Wide Web, pages 1063–1064. ACM.
    Google ScholarLocate open access versionFindings
  • Dirk Weissenborn, Georg Wiese, and Laura Seiffe. 2017. Making neural qa as simple as possible but not simpler. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 271–280. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Johannes Welbl, Pontus Stenetorp, and Sebastian Riedel. 2018. Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics, 6:287–302.
    Google ScholarLocate open access versionFindings
  • Caiming Xiong, Victor Zhong, and Richard Socher. 2016. Dynamic coattention networks for question answering. arXiv preprint arXiv:1611.01604.
    Findings
  • Yuhao Zhang, Peng Qi, and Christopher D. Manning. 2018a. Graph convolution over pruned dependency trees improves relation extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2205–2215. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexander J Smola, and Le Song. 2018b. Variational reasoning for question answering with knowledge graph. The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18).
    Google ScholarLocate open access versionFindings
  • Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bidirectional attention flow for machine comprehension. International Conference on Learning Representations (ICLR).
    Google ScholarFindings
  • 2. For the query representation q, we apply 2 bi-LSTM layers of 256 and 128 hidden units to its ELMo vectors. The concatenation of the forward and backward states results in a 256-dimensional question representation.
    Google ScholarFindings
  • 4. All transformations f∗ in R-GCN-layers are affine and they do maintain the input and output dimensionality of node representations the same (512-dimensional).
    Google ScholarFindings
  • 5. Eventually, a 2-layers MLP with [256, 128] hidden units takes the concatenation between {h(iL)}Ni=1 and q to predict the probability that a candidate node vi may be the answer to the query q (see Equation 1). We train our models with a batch size of 32 for at most 20 epochs using the Adam optimizer (Kingma and Ba, 2015) with β1 = 0.9, β2 = 0.999 and a learning rate of 10−4. To help against overfitting, we employ dropout (drop rate ∈ 0, 0.1, 0.15, 0.2, 0.25) (Srivastava et al., 2014) and early-stopping on validation accuracy. We report the best results of each experiment based on accuracy on validation set.
    Google ScholarLocate open access versionFindings
作者
Nicola De Cao
Nicola De Cao
Wilker Aziz
Wilker Aziz
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科