Towards Interpretable Reasoning over Paragraph Effects in Situation

EMNLP 2020, 2020.

Cited by: 0|Bibtex|Views39
Other Links: arxiv.org
Keywords:
neural networkmultilayer perceptronmachine reading comprehensionnetwork modulenovel situationMore(8+)
Weibo:
These results show that compared to one-step “black box” model, our interpretable approach which mimics the human reasoning process has a better capability of conducting such complex reasoning

Abstract:

We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect described in a background paragraph, and apply the knowledge to a novel situation. Existing works ignore the complicated reasoning process and solve it with a one-step "black box" model. Inspired by human cogn...More

Code:

Data:

0
Introduction
  • As a long-standing fundamental task of natural language processing, machine reading comprehension (MRC) has attracted remarkable attention recently and different MRC datasets have been studied (Rajpurkar et al, 2018; Dua et al, 2019b; Choi et al, 2018; Yang et al, 2018), among which reasoning over paragraph effects in situation (ROPES for short) is a very challenging scenario that needs to understand knowledge from a background paragraph and apply it to answer questions in a novel situation.
  • Table 1 shows an example of the ROPES dataset (Lin et al, 2019), where the background passage states that developmental difficulties could usually be treated by using iodized salt, the situation passage describes two villages using different salt, and questions about which village having more/less people experiencing developmental difficulties need to be answered.
  • Before iodized salt was developed, some people experienced a number of developmental difficulties, including problems with thyroid gland function and mental retardation.
  • A: Salt Q: Which village had less people experience developmental difficulties?
Highlights
  • As a long-standing fundamental task of natural language processing, machine reading comprehension (MRC) has attracted remarkable attention recently and different MRC datasets have been studied (Rajpurkar et al, 2018; Dua et al, 2019b; Choi et al, 2018; Yang et al, 2018), among which reasoning over paragraph effects in situation (ROPES for short) is a very challenging scenario that needs to understand knowledge from a background paragraph and apply it to answer questions in a novel situation
  • Inspired by human cognitive processes, in this paper, we propose a sequential approach that leverages neural network modules to implement each step of the above process1
  • A Reasoning module to infer comparison of mentioned worlds in terms of the effect property. These modules are trained in an end-to-end manner, and auxiliary loss over intermediate latent decisions further boosts the model accuracy
  • These results show that compared to one-step “black box” model, our interpretable approach which mimics the human reasoning process has a better capability of conducting such complex reasoning
  • We list the performance of our approach and the baseline model when using only randomly sampled 10% of training data in Table 3. Both the neural network modules and answer prediction model in our approach are trained with only 1074 questions
  • Experimental results demonstrate the effectiveness of each module, and analysis on intermediate outputs presents good interpretability for the inference process in contrasted with “black box” models
Methods
  • As shown in Figure 1, the approach consists of three components which are contextual encoding, interpretable reasoning, and answer prediction.

    3.1 Contextual Encoding

    The authors use RoBERTa (Devlin et al, 2019; Liu et al, 2019) to encode background, situation and question together and generate contextualized embeddings.
  • As shown in Figure 1, the approach consists of three components which are contextual encoding, interpretable reasoning, and answer prediction.
  • The authors use RoBERTa (Devlin et al, 2019; Liu et al, 2019) to encode background, situation and question together and generate contextualized embeddings.
  • Where Hb ∈ Rm×d, Hs ∈ Rn×d, and Hq ∈ Rl×d are contextual embeddings for the background, situation, and question, respectively, d is the dimension for hidden states.
  • The module aims to identify concerned worlds from situation according to a question.
  • The authors can handle multiple worlds by extending the module with more MLPs
Results
  • Experimental results on the

    ROPES dataset demonstrate the effectiveness and explainability of the proposed approach.
  • Table 3 shows question answering performance of different models, where the approach outperforms the RoBERTa large model by 8.4% and 6.4% in terms of EM and F1 scores respectively.
  • ID:710693196 Background: Fish reproduce sexually
  • They lay eggs that can be fertilized either inside or outside of the body.
  • Group A consists of fish, and group B consists of non fish creatures in the water
  • He started to see the differences between these two groups Q&A:In group B, would fertilization most likely take place inside or outside of mother’s body?
  • He started to see the differences between these two groups Q&A:In group B, would fertilization most likely take place inside or outside of mother’s body? inside
Conclusion
  • Conclusion and Future Work

    In this paper, the authors aim to answer ROPES questions in an interpretable way by leveraging five neural network modules.
  • The authors find that with explicitly designed compositional modeling of inference process, the approach with a few training examples achieves similar accuracy to strong baselines with full-size training data which indicates a better generalization capability.
  • Extending these models to a larger scope of question types or more complex scenarios is still a challenge, and the authors will further investigate the trade-off between explainability and scalability
Summary
  • Introduction:

    As a long-standing fundamental task of natural language processing, machine reading comprehension (MRC) has attracted remarkable attention recently and different MRC datasets have been studied (Rajpurkar et al, 2018; Dua et al, 2019b; Choi et al, 2018; Yang et al, 2018), among which reasoning over paragraph effects in situation (ROPES for short) is a very challenging scenario that needs to understand knowledge from a background paragraph and apply it to answer questions in a novel situation.
  • Table 1 shows an example of the ROPES dataset (Lin et al, 2019), where the background passage states that developmental difficulties could usually be treated by using iodized salt, the situation passage describes two villages using different salt, and questions about which village having more/less people experiencing developmental difficulties need to be answered.
  • Before iodized salt was developed, some people experienced a number of developmental difficulties, including problems with thyroid gland function and mental retardation.
  • A: Salt Q: Which village had less people experience developmental difficulties?
  • Objectives:

    The authors aim to answer ROPES questions in an interpretable way by leveraging five neural network modules.
  • Methods:

    As shown in Figure 1, the approach consists of three components which are contextual encoding, interpretable reasoning, and answer prediction.

    3.1 Contextual Encoding

    The authors use RoBERTa (Devlin et al, 2019; Liu et al, 2019) to encode background, situation and question together and generate contextualized embeddings.
  • As shown in Figure 1, the approach consists of three components which are contextual encoding, interpretable reasoning, and answer prediction.
  • The authors use RoBERTa (Devlin et al, 2019; Liu et al, 2019) to encode background, situation and question together and generate contextualized embeddings.
  • Where Hb ∈ Rm×d, Hs ∈ Rn×d, and Hq ∈ Rl×d are contextual embeddings for the background, situation, and question, respectively, d is the dimension for hidden states.
  • The module aims to identify concerned worlds from situation according to a question.
  • The authors can handle multiple worlds by extending the module with more MLPs
  • Results:

    Experimental results on the

    ROPES dataset demonstrate the effectiveness and explainability of the proposed approach.
  • Table 3 shows question answering performance of different models, where the approach outperforms the RoBERTa large model by 8.4% and 6.4% in terms of EM and F1 scores respectively.
  • ID:710693196 Background: Fish reproduce sexually
  • They lay eggs that can be fertilized either inside or outside of the body.
  • Group A consists of fish, and group B consists of non fish creatures in the water
  • He started to see the differences between these two groups Q&A:In group B, would fertilization most likely take place inside or outside of mother’s body?
  • He started to see the differences between these two groups Q&A:In group B, would fertilization most likely take place inside or outside of mother’s body? inside
  • Conclusion:

    Conclusion and Future Work

    In this paper, the authors aim to answer ROPES questions in an interpretable way by leveraging five neural network modules.
  • The authors find that with explicitly designed compositional modeling of inference process, the approach with a few training examples achieves similar accuracy to strong baselines with full-size training data which indicates a better generalization capability.
  • Extending these models to a larger scope of question types or more complex scenarios is still a challenge, and the authors will further investigate the trade-off between explainability and scalability
Tables
  • Table1: An example from the ROPES dataset. Effect property tokens are highlighted in blue, cause property tokens in orange, and world tokens in green
  • Table2: ROPES statistics it is the only dataset that requires reasoning over paragraph effects in situation. Given a background paragraph that contains knowledge about relations of causes and effects and a novel situation, questions about applying the knowledge to the novel situation need to be answered
  • Table3: Performance of different models on the ROPES dataset under cross-validation setting
  • Table4: A running example with visualized intermediate outputs of our approach
  • Table5: Performance of Each Module outputs several intermediate results. First, it identifies two concerned worlds, 8 AM and 1 PM from the situation. Then it predicts the effect property, CPU load goes up, given which the cause property in the background (i.e. storing large volumes of data) and according values for the two worlds (i.e. 301 Gigabytes and sleep) are predicted. Next, it compares the two worlds in terms of cause property and predicts that world 1 is larger than world 2. Also it predicts that the cause property and effect property is positively related, i.e. the relation is classified as 1. Finally, it reasons that world 1 takes higher CPU loads than world 2. This example demonstrates that our approach not only predicts the final answer for the question, but also provides detailed explanations for the reasoning process
  • Table6: Detailed parameters used in Interpretable Reasoning, we provide search bounds for each hyperparameter and list out the hyperparameters combination for out best model. Other unmentioned parameters keep same as the one used in BERT
  • Table7: Detailed parameters used in Answer Prediction, we provide search bounds for each hyperparameter and list out the hyperparameters combination for out best model and baseline model. Other unmentioned parameters keep same as the one used in BERT
  • Table8: An example with auxiliary supervision labels
  • Table9: Type 1 error cases made by our model
  • Table10: Type 2 error cases could not be solved by our model
  • Table11: Examples correctly answered by our model in an intepretable manner
Download tables as Excel
Related work
  • Neural network modules have been studied by several works. Andreas et al (2016) propose neural module networks with a semantic parser on visual question answering. Jiang and Bansal (2019) apply a self-assembling modular network with only three modules: Find, Relocate and Compare to Hotpot QA (Yang et al, 2018). Gupta et al (2019) extend the neural module networks to answer compositional questions against a paragraphs of text as context, and perform symbolic reasoning on the self-pruned subset of DROPS (Dua et al, 2019b). Compared with them, we focus on a more challenging MRC task: reasoning over paragraph effects in situation, which has been rarely investigated and needs more complex reasoning. So far as we know, the only two works (i.e. (Lin et al, 2019) and (Khashabi et al, 2020)) on this topic uses a one-step “black box” model. Such an approach performs well on some questions at the expense of limited intepretability. Our work solves this task in a logical manner and exposes intermediate reasoning steps which improves performance and interpretability concurrently.
Funding
  • We acknowledge this work is supported by National Natural Science Foundation of China (No.61751201) and National Key RD Plan (No.2016QY03D0602), We would also like to thank the anonymous reviewers for their insightful suggestions
Reference
  • Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Neural module networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 39–48.
    Google ScholarLocate open access versionFindings
  • Michael W Browne. 2000. Cross-validation methods. Journal of mathematical psychology, 44(1):108– 132.
    Google ScholarLocate open access versionFindings
  • Prabir Burman. 1989. A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika, 76(3):503–514.
    Google ScholarLocate open access versionFindings
  • Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wentau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. QuAC: Question answering in context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2174–2184, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Sameer Singh, and Matt Gardner. 2019a. Orb: An open reading benchmark for comprehensive evaluation of machine reading comprehension. arXiv preprint arXiv:1912.12598.
    Findings
  • Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. 2019b.
    Google ScholarFindings
  • DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2368–2378, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jonathan St BT Evans. 1984. Heuristic and analytic processes in reasoning. British Journal of Psychology, 75(4):451–468.
    Google ScholarLocate open access versionFindings
  • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, et al. 2020. Evaluating nlp models via contrast sets. arXiv preprint arXiv:2004.02709.
    Findings
  • Mor Geva, Yoav Goldberg, and Jonathan Berant. 2019. Are we modeling the task or the annotator? an investigation of annotator bias in natural language understanding datasets. arXiv preprint arXiv:1908.07898.
    Findings
  • Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, and Matt Gardner. 2019. Neural module networks for reasoning over text. arXiv preprint arXiv:1912.04971.
    Findings
  • Yichen Jiang and Mohit Bansal. 2019. Self-assembling modular networks for interpretable multi-hop reasoning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4474–4484, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, and Hannaneh Hajishirzi. 2020. Unifiedqa: Crossing format boundaries with a single qa system. arXiv preprint arXiv:2005.00700.
    Findings
  • Kevin Lin, Oyvind Tafjord, Peter Clark, and Matt Gardner. 2019. Reasoning over paragraph effects in situations. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, pages 58– 62, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Kouider Mokhtari and Carla A Reichard. 2002. Assessing students’ metacognitive awareness of reading strategies. Journal of educational psychology, 94(2):249.
    Google ScholarLocate open access versionFindings
  • Kouider Mokhtari and Ravi Sheorey. 2002. Measuring esl students’ awareness of reading strategies. Journal of developmental education, 25(3):2–11.
    Google ScholarLocate open access versionFindings
  • Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784– 789, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sebastian Raschka. 2018. Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808.
    Findings
  • Ravi Sheorey and Kouider Mokhtari. 2001. Differences in the metacognitive awareness of reading strategies among native and non-native readers. System, 29(4):431–449.
    Google ScholarLocate open access versionFindings
  • Steven A Sloman. 1996. The empirical case for two systems of reasoning. Psychological bulletin, 119(1):3.
    Google ScholarLocate open access versionFindings
  • Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments