Neural Module Networks for Reasoning over Text

ICLR, 2020.

Cited by: 3|Bibtex|Views132
EI
Other Links: academic.microsoft.com|dblp.uni-trier.de|arxiv.org
Weibo:
While we have demonstrated marked success in broadening the scope of neural modules and applying them to open-domain text, it remains a significant challenge to extend these models to the full range of reasoning required even just for the DROP dataset

Abstract:

Answering compositional questions that require multiple steps of reasoning against text is challenging, especially when they involve discrete, symbolic operations. Neural module networks (NMNs) learn to parse such questions as executable programs composed of learnable modules, performing well on synthetic visual QA domains. However, we fi...More
Introduction
  • Being formalism-free and close to an end-user task, QA is increasingly becoming a proxy for gauging a model’s natural language understanding capability (He et al, 2015; Talmor et al, 2018).
  • Neural module networks (NMNs; Andreas et al, 2016) extend semantic parsers by making the program executor a learned function composed of neural network modules.
  • These modules are designed to perform basic reasoning tasks and can be composed to perform complex reasoning over unstructured knowledge
Highlights
  • Being formalism-free and close to an end-user task, QA is increasingly becoming a proxy for gauging a model’s natural language understanding capability (He et al, 2015; Talmor et al, 2018)
  • Neural module networks (NMNs; Andreas et al, 2016) extend semantic parsers by making the program executor a learned function composed of neural network modules
  • Our contributions are two-fold: Firstly, we extend Neural module networks to answer compositional questions against a paragraph of text as context
  • To overcome issues in learning, (a) we introduce an unsupervised auxiliary loss to provide an inductive bias to the execution of find-num, find-date, and relocate modules (§4.1); and (b) provide heuristically-obtained supervision for question program and intermediate module output (§4.2) for a subset of questions (5–10%)
  • We show how to use neural module networks to answer compositional questions requiring symbolic reasoning against natural language text
  • While we have demonstrated marked success in broadening the scope of neural modules and applying them to open-domain text, it remains a significant challenge to extend these models to the full range of reasoning required even just for the DROP dataset
Methods
  • 5.1 DATASET

    We perform experiments on a portion of the recently released DROP dataset (Dua et al, 2019), which to the best of our knowledge is the only dataset that requires the kind of compositional and symbolic reasoning that our model aims to solve.
  • Our model possesses diverse but limited reasoning capability; we try to automatically extract questions in the scope of our model based on their first n-gram
  • These n-grams were selected by performing manual analysis on a small set of questions.
  • Since the DROP test set is hidden, this test set is extracted from the validation data
  • Though this is a subset of the full DROP dataset it is still a significantly-sized dataset that allows drawing meaningful conclusions.
  • We make our subset and splits available publicly with the code
Results
  • We compare to publicly available best performing models: NAQANet (Dua et al, 2019), NABERT+ (Kinley & Lin, 2019), TAG-NABERT+ (Avia Efrat & Shoham, 2019), and MTMSN (Hu et al, 2019), all trained on the same data as our model.
  • Table 2a compares our model’s performance to state-of-the-art models on our full test set.
  • Using BERT representations, our model’s performance increases to 77.4 F1 and outperforms SoTA models that use BERT representations, such as MTMSN (76.5 F1).
  • This shows the efficacy of our proposed model in understanding complex compositional questions and performing multi-step
Conclusion
  • We show how to use neural module networks to answer compositional questions requiring symbolic reasoning against natural language text.
  • We define probabilistic modules that propagate uncertainty about symbolic reasoning operations in a way that is end-to-end differentiable.
  • While we have demonstrated marked success in broadening the scope of neural modules and applying them to open-domain text, it remains a significant challenge to extend these models to the full range of reasoning required even just for the DROP dataset.
  • Future research is necessary to continue bridging these reasoning gaps
Summary
  • Introduction:

    Being formalism-free and close to an end-user task, QA is increasingly becoming a proxy for gauging a model’s natural language understanding capability (He et al, 2015; Talmor et al, 2018).
  • Neural module networks (NMNs; Andreas et al, 2016) extend semantic parsers by making the program executor a learned function composed of neural network modules.
  • These modules are designed to perform basic reasoning tasks and can be composed to perform complex reasoning over unstructured knowledge
  • Methods:

    5.1 DATASET

    We perform experiments on a portion of the recently released DROP dataset (Dua et al, 2019), which to the best of our knowledge is the only dataset that requires the kind of compositional and symbolic reasoning that our model aims to solve.
  • Our model possesses diverse but limited reasoning capability; we try to automatically extract questions in the scope of our model based on their first n-gram
  • These n-grams were selected by performing manual analysis on a small set of questions.
  • Since the DROP test set is hidden, this test set is extracted from the validation data
  • Though this is a subset of the full DROP dataset it is still a significantly-sized dataset that allows drawing meaningful conclusions.
  • We make our subset and splits available publicly with the code
  • Results:

    We compare to publicly available best performing models: NAQANet (Dua et al, 2019), NABERT+ (Kinley & Lin, 2019), TAG-NABERT+ (Avia Efrat & Shoham, 2019), and MTMSN (Hu et al, 2019), all trained on the same data as our model.
  • Table 2a compares our model’s performance to state-of-the-art models on our full test set.
  • Using BERT representations, our model’s performance increases to 77.4 F1 and outperforms SoTA models that use BERT representations, such as MTMSN (76.5 F1).
  • This shows the efficacy of our proposed model in understanding complex compositional questions and performing multi-step
  • Conclusion:

    We show how to use neural module networks to answer compositional questions requiring symbolic reasoning against natural language text.
  • We define probabilistic modules that propagate uncertainty about symbolic reasoning operations in a way that is end-to-end differentiable.
  • While we have demonstrated marked success in broadening the scope of neural modules and applying them to open-domain text, it remains a significant challenge to extend these models to the full range of reasoning required even just for the DROP dataset.
  • Future research is necessary to continue bridging these reasoning gaps
Tables
  • Table1: Description of the modules we define and their expected behaviour. All inputs and outputs are represented as distributions over tokens, numbers, and dates as described in §3.1
  • Table2: Performance of different models on the dataset and across different question types
Download tables as Excel
Related work
  • Semantic parsing techniques have been used for a long time for compositional question understanding. Approaches have used labeled logical-forms (Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005), or weak QA supervision (Clarke et al, 2010; Berant et al, 2013; Reddy et al, 2014) to learn parsers to answer questions against structured knowledge bases. These have also been extended for QA using symbolic reasoning against semi-structured tables (Pasupat & Liang, 2015; Krishnamurthy et al, 2017; Neelakantan et al, 2016). Recently, BERT-based models for DROP have been been

    Supervision Type w/ BERT Hloss MOD-SUP –*

    w/ GRU (a) Effect of Auxiliary Supervision: The auxiliary loss contributes significantly to the performance, whereas module output supervision has little effect. *Training diverges without Hloss for the BERT-based model.

    (b) Performance with less training data: Our model performs significantly better than the baseline with less training data, showing the efficacy of explicitly modeling compositionality.

    proposed (Hu et al, 2019; Andor et al, 2019; Kinley & Lin, 2019), but all these models essentially perform a multiclass classification over pre-defined programs. Our model on the other hand provides an interpretable, compositional parse of the question and exposes its intermediate reasoning steps.

    For combining learned execution modules with semantic parsing, many variations to NMNs have been proposed; NMN (Andreas et al, 2016) use a PCFG parser to parse the question and only learn module parameters. N2NMNs (Hu et al, 2017) simultaneously learn to parse and execute but require pre-training the parser. Gupta & Lewis (2018) propose a NMN model for QA against knowledge graphs and learn execution for semantic operators from QA supervision alone. Recent works (Gupta & Lewis, 2018; Mao et al, 2019) also use domain-knowledge to alleviate issues in learning by using curriculum learning to train the executor first on simple questions for which parsing is not an issue. All these approaches perform reasoning on synthetic domains, while our model is applied to natural language. Concurrently, Jiang & Bansal (2019) apply NMN to HotpotQA (Yang et al, 2018) but their model comprises of only 3 modules and is not capable of performing symbolic reasoning.
Funding
  • This material is based upon work sponsored in part by the DARPA MCS program under Contract No N660011924033 with the United States Office Of Naval Research, an ONR award, the LwLL DARPA program, and a grant from AI2
Reference
  • Daniel Andor, Luheng He, Kenton Lee, and Emily Pitler. Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension. ArXiv, abs/1909.00109, 2019.
    Findings
  • Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Elad Segal Avia Efrat and Mor Shoham. Tag-based multi-span extraction in reading comprehension. 2019. URL https://github.com/eladsegal/project-NLP-AML.
    Findings
  • Yoshua Bengio, Jerome Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In ICML, 2009.
    Google ScholarLocate open access versionFindings
  • Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic parsing on Freebase from Question-Answer pairs. In EMNLP, 2013.
    Google ScholarFindings
  • James Clarke, Dan Goldwasser, Ming-Wei Chang, and Dan Roth. Driving semantic parsing from the world’s response. In CoNLL, 2010.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, 2019.
    Google ScholarLocate open access versionFindings
  • Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In NAACL-HLT, 2019.
    Google ScholarLocate open access versionFindings
  • Shi Feng, Eric Wallace, Alvin Grissom, Mohit Iyyer, Pedro Rodriguez, and Jordan L. Boyd-Graber. Pathologies of neural models make interpretation difficult. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew E. Peters, Michael Schmitz, and Luke S. Zettlemoyer. Allennlp: A deep semantic natural language processing platform. ArXiv, abs/1803.07640, 2018.
    Findings
  • Nitish Gupta and Mike Lewis. Neural compositional denotational semantics for question answering. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • Luheng He, Mike Lewis, and Luke S. Zettlemoyer. Question-answer driven semantic role labeling: Using natural language to annotate natural language. In EMNLP, 2015.
    Google ScholarLocate open access versionFindings
  • Minghao Hu, Yuxing Peng, Zhiheng Huang, and Dongsheng Li. A multi-type multi-span network for reading comprehension that requires discrete reasoning. In EMNLP, 2019.
    Google ScholarLocate open access versionFindings
  • Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Kate Saenko. Learning to reason: End-to-end module networks for visual question answering. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Robin Jia and Percy Liang. Adversarial examples for evaluating reading comprehension systems. In EMNLP, 2017.
    Google ScholarLocate open access versionFindings
  • Yichen Jiang and Mohit Bansal. Self-assembling modular networks for interpretable multi-hop reasoning. In EMNLP, 2019.
    Google ScholarLocate open access versionFindings
  • Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross B. Girshick. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Jambay Kinley and Raymond Lin. Nabert+: Improving numerical reasoning in reading comprehension. 2019. URL https://github.com/raylin1000/drop-bert.
    Findings
  • Jayant Krishnamurthy, Pradeep Dasigi, and Matt Gardner. Neural semantic parsing with type constraints for semi-structured tables. In EMNLP, 2017.
    Google ScholarLocate open access versionFindings
  • Percy S. Liang, Michael I. Jordan, and Dan Klein. Learning dependency-based compositional semantics. Computational Linguistics, 2011.
    Google ScholarLocate open access versionFindings
  • Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, and Jiajun Wu. The neurosymbolic concept learner: Interpreting scenes words and sentences from natural supervision. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Arvind Neelakantan, Quoc V Le, Martin Abadi, Andrew McCallum, and Dario Amodei. Learning a natural language interface with neural programmer. In ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • Panupong Pasupat and Percy Liang. Compositional semantic parsing on semi-structured tables. In ACL, 2015.
    Google ScholarLocate open access versionFindings
  • Siva Reddy, Mirella Lapata, and Mark Steedman. Large-scale semantic parsing without questionanswer pairs. TACL, 2014.
    Google ScholarLocate open access versionFindings
  • Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. Commonsenseqa: A question answering challenge targeting commonsense knowledge. In NAACL-HLT, 2018.
    Google ScholarLocate open access versionFindings
  • Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • John M Zelle and Raymond J Mooney. Learning to parse database queries using inductive logic programming. In Proceedings of the national conference on artificial intelligence, 1996.
    Google ScholarLocate open access versionFindings
  • Luke S Zettlemoyer and Michael Collins. Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In UAI ’05, 2005.
    Google ScholarLocate open access versionFindings
  • Zhuosheng Zhang, Yuwei Wu, Junru Zhou, Sufeng Duan, and Hai Zhao. Sg-net: Syntax-guided machine reading comprehension. arXiv preprint arXiv:1908.05147, 2019.
    Findings
  • We pre-process the paragraphs to extract the numbers and dates in them. For numbers, we use a simple strategy where all tokens in the paragraph that can be parsed as a number are extracted. For example, 200 in “200 women”. The total number of number-tokens in the paragraph is denoted by Ntokens. We do not normalize numbers based on their units and leave it for future work. To extract dates from the paragraph, we run the spaCy-NER3 and collect all “DATE” mentions. To normalize the date mentions we use an off-the-shelf date-parser4. For example, a date mention “19th November, 1961” would be normalized to (19, 11, 1961) (day, month, year). The total number of date-tokens is denoted by Dtokens
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments