REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

Cited by: 0|Bibtex|Views29
Other Links: arxiv.org
Weibo:
We propose a recursive erasure memory network that refines evidence for commonsense question

Abstract:

When answering a question, people often draw upon their rich world knowledge in addition to the particular context. While recent works retrieve supporting facts/evidence from commonsense knowledge bases to supply additional information to each question, there is still ample opportunity to advance it on the quality of the evidence. It is...More

Code:

Data:

0
Introduction
  • Commonsense question answering is recently an attractive field in that it requires systems to understand the common sense information beyond words, which are normal to human beings but nontrivial for machines.
  • Recent popular solution resorts to external supporting facts from such knowledge bases as evidence, to enhance the question with commonsense knowledge or the logic of reasoning (Devlin et al 2019; Liu et al 2019; Lv et al 2020; Lin et al 2019; Xu et al 2020).
  • There is need for models that will further the processing of the evidence
Highlights
  • Commonsense question answering is recently an attractive field in that it requires systems to understand the common sense information beyond words, which are normal to human beings but nontrivial for machines
  • We propose a model named recursive erasure memory network (REM-Net) towards evidence refinement according to the commonsense question, which improves the explainability of the supporting facts
  • The REM-Net is compared with three groups of
  • We propose a recursive erasure memory network (REM-Net) that refines evidence for commonsense question
  • The recursive procedure leads to repeated use of high-quality supporting facts, so that the question answering is conducted by useful information
  • Experimental results demonstrates that REM-Net is effective for the commonsense QA tasks, and the evidence refinement is interpretable
Methods
  • Compared Methods The authors compare the performance of

    REM-Net with several groups of competitive methods.

    Group 1: Baselines.
  • Compared Methods The authors compare the performance of.
  • REM-Net with several groups of competitive methods.
  • Group 1: Baselines.
  • For WIQA, Majority predicts the most frequent answer option in the training set.
  • Polarity predicts answers with the most comparative words.
  • Adaboost Dev. Commonsense-Rc (2018a) GPT-FT (2018) DMCN (2020)
Results
  • The experimental results are presented in Table 1 and Table 2.
  • In the CosmosQA dataset, the REM-Net outperforms all of the compared methods.
  • REM-Net (RoBERTaLARGE) is mainly inferior in the “in-para” and “out-of-para” data type, but surpasses compared methods in the “no-effect” data type.
  • This is because the majority of the “in-para” and “out-of-para” evidence is meaningful to the question, and the erasure operation from the REM module provides limited effect
Conclusion
  • The authors propose a recursive erasure memory network (REM-Net) that refines evidence for commonsense question.
  • It recursively estimates quality of each supporting fact based on the question, and refines the supporting fact set .
  • Experimental results demonstrates that REM-Net is effective for the commonsense QA tasks, and the evidence refinement is interpretable.
  • The authors evaluate the quality of generated evidence compared to retrieved evidence, learning that using generated evidence gives better performance
Summary
  • Introduction:

    Commonsense question answering is recently an attractive field in that it requires systems to understand the common sense information beyond words, which are normal to human beings but nontrivial for machines.
  • Recent popular solution resorts to external supporting facts from such knowledge bases as evidence, to enhance the question with commonsense knowledge or the logic of reasoning (Devlin et al 2019; Liu et al 2019; Lv et al 2020; Lin et al 2019; Xu et al 2020).
  • There is need for models that will further the processing of the evidence
  • Methods:

    Compared Methods The authors compare the performance of

    REM-Net with several groups of competitive methods.

    Group 1: Baselines.
  • Compared Methods The authors compare the performance of.
  • REM-Net with several groups of competitive methods.
  • Group 1: Baselines.
  • For WIQA, Majority predicts the most frequent answer option in the training set.
  • Polarity predicts answers with the most comparative words.
  • Adaboost Dev. Commonsense-Rc (2018a) GPT-FT (2018) DMCN (2020)
  • Results:

    The experimental results are presented in Table 1 and Table 2.
  • In the CosmosQA dataset, the REM-Net outperforms all of the compared methods.
  • REM-Net (RoBERTaLARGE) is mainly inferior in the “in-para” and “out-of-para” data type, but surpasses compared methods in the “no-effect” data type.
  • This is because the majority of the “in-para” and “out-of-para” evidence is meaningful to the question, and the erasure operation from the REM module provides limited effect
  • Conclusion:

    The authors propose a recursive erasure memory network (REM-Net) that refines evidence for commonsense question.
  • It recursively estimates quality of each supporting fact based on the question, and refines the supporting fact set .
  • Experimental results demonstrates that REM-Net is effective for the commonsense QA tasks, and the evidence refinement is interpretable.
  • The authors evaluate the quality of generated evidence compared to retrieved evidence, learning that using generated evidence gives better performance
Tables
  • Table1: Results (accuracy%) on the WIQA test set, including accuracies on three separate question types (In=“inpara”, Out=“out-of-para”, No=“no-effect”), and the overall test set. The baselines labeled with ∗ are taken from <a class="ref-link" id="cTandon_et+al_2019_a" href="#rTandon_et+al_2019_a">Tandon et al (2019</a>), in which the used test set is slightly different
  • Table2: Results (accuracy%) on the CosmosQA development set
  • Table3: Ablation studies on REM-Net (BERTBASE) that are conducted on WIQA. E signifies the erasure manipulation, while R indicates to the recursive mechanism. In=“In-para” type, Out=“Out-of-para” type, No=“No-effect” type
  • Table4: Ablation studies on REM-Net (BERTLARGE) that are conducted on CosmosQA. E denotes the erasure manipulation, while R refers to the recursive mechanism
Download tables as Excel
Related work
  • Commonsense Question Answering Similar to opendomain question answering tasks (Rajpurkar, Jia, and Liang 2018; Kwiatkowski et al 2019), commonsense question answering (Tandon et al 2019; Huang et al 2019) requires open-domain information to support the answer prediction. But different from open-domain question answering tasks that the text comprehension is straightforward and the retrieved open-domain information is direct to the questions, in commonsense question answering tasks the open-domain information is more complicated in that they play a role as evidence to bridge the understanding gap in the commonsense questions. Current works leverage the open-domain information by whether incorporating external knowledge as evidence or training the models to generate evidence. Lv et al (2020) extracts knowledge from ConceptNet (Speer, Chin, and Havasi 2017) and Wikipedia, and learns features with GCN (Kipf and Welling 2016) and graph attention (Velickovicet al. 2017). Zhong et al (2019) retrieves ConceptNet (Speer, Chin, and Havasi 2017) triplets and train two functions to measure direct and indirect connections between concepts. Rajani et al (2019) train a GPT (Zhong et al 2019) to generate reasonable evidence for the questions. During evaluation, the model generates evidence and predicts the multi-choice answers concurrently. Ye et al (2019) automatically constructs a commonsense multi-choice dataset from ConceptNet triplets. However, the retrieved or generated evidence are usually not further refined, and some of them could be unnecessary or even confounding to answering the questions. The proposed model explores to refine the original evidence to discover those most supporting evidence to the commonsense questions and therefore provides stronger interpretations.
Funding
  • This work was supported in part by National Natural Science Foundation of China (NSFC) under Grant No.U19A2073 and No.61976233, Guangdong Province Basic and Applied Basic Research (Regional Joint Fund-Key) Grant No.2019B1515120039, Nature Science Foundation of Shenzhen Under Grant No 2019191361, Zhijiang Lab’s Open Fund (No 2020AA3AB14) and CSIG Young Fellow Support Fund
Study subjects and analysis
cases: 3
The REM-Net refines the evidence in a multi-hop manner, and the performance gap between different evidence are small, but generated evidence still gives better result. We show three cases to see the qualify of refined evidence, as presented in Figure 6. Figure 6 (1) shows a successful case in WIQA

Reference
  • Bordes, A.; Usunier, N.; Chopra, S.; and Weston, J. 2015. Large-scale Simple Question Answering with Memory Networks. ArXiv, abs/1506.02075.
    Findings
  • Bosselut, A.; Rashkin, H.; Sap, M.; Malaviya, C.; Celikyilmaz, A.; and Choi, Y. 2019. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Cao, Y.; Fang, M.; and Tao, D. 2019. Bag: Bi-directional attention entity graph convolutional network for multi-hop reasoning question answering. In Proc. of NAACL.
    Google ScholarLocate open access versionFindings
  • Chen, D.; Bolton, J.; and Manning, C. D. 2016. A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Dai, Z.; Dai, W.; Liu, Z.; Rao, F.; Chen, H.; Zhang, G.; Ding, Y.; and Liu, J. 2019. Multi-Task Multi-Head Attention Memory Network for Fine-Grained Sentiment Analysis. In Proc. of NLPCC.
    Google ScholarLocate open access versionFindings
  • Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of NAACL.
    Google ScholarLocate open access versionFindings
  • Dhingra, B.; Liu, H.; Yang, Z.; Cohen, W. W.; and Salakhutdinov, R. 201Gated-Attention Readers for Text Comprehension. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Freund, Y.; and Schapire, R. E. 1995. A desicion-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory, 23–37. Springer.
    Google ScholarLocate open access versionFindings
  • Huang, L.; Bras, R. L.; Bhagavatula, C.; and Choi, Y. 201Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Kingma, D. P.; and Ba, J. L. 2015. Adam: A Method for Stochastic Optimization. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Kipf, T. N.; and Welling, M. 2016. Semi-Supervised Classification with Graph Convolutional Networks. ArXiv, abs/1609.02907.
    Findings
  • Kwiatkowski, T.; Palomaki, J.; Redfield, O.; Collins, M.; Parikh, A. P.; Alberti, C.; Epstein, D.; Polosukhin, I.; Devlin, J.; Lee, K.; Toutanova, K.; Jones, L.; Kelcey, M.; Chang, M.W.; Dai, A. M.; Uszkoreit, J.; Le, Q.; and Petrov, S. 2019. Natural Questions: A Benchmark for Question Answering Research. Proc. of ACL.
    Google ScholarFindings
  • Lai, G.; Xie, Q.; Liu, H.; Yang, Y.; and Hovy, E. H. 2017. RACE: Large-scale ReAding Comprehension Dataset From Examinations. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Lin, B. Y.; Chen, X.; Chen, J.; and Ren, X. 2019. KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; and Stoyanov, V. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv, abs/1907.11692.
    Findings
  • Lv, S.; Guo, D.; Xu, J.; Tang, D.; Duan, N.; Gong, M.; Shou, L.; Jiang, D.; Cao, G.; and Hu, S. 2020. Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering. Proc. of AAAI.
    Google ScholarFindings
  • Miller, A. H.; Fisch, A.; Dodge, J.; Karimi, A.-H.; Bordes, A.; and Weston, J. 2016. Key-Value Memory Networks for Directly Reading Documents. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Mitchell, T.; Cohen, W.; Hruschka, E.; Talukdar, P.; Betteridge, J.; Carlson, A.; Dalvi, B.; Gardner, M.; Kisiel, B.; Krishnamurthy, J.; Lao, N.; Mazaitis, K.; Mohamed, T.; Nakashole, N.; Platanios, E.; Ritter, A.; Samadi, M.; Settles, B.; Wang, R.; Wijaya, D.; Gupta, A.; Chen, X.; Saparov, A.; Greaves, M.; and Welling, J. 2015. Never-Ending Learning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15).
    Google ScholarLocate open access versionFindings
  • Parikh, A. P.; Tackstrom, O.; Das, D.; and Uszkoreit, J. 2016. A Decomposable Attention Model for Natural Language Inference. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Radford, A.; Narasimhan, K.; Salimans, T.; and Sutskever, I. 2018. Improving language understanding with unsupervised learning. Technical report, OpenAI.
    Google ScholarFindings
  • Rajani, N. F.; McCann, B.; Xiong, C.; and Socher, R. 2019. Explain Yourself! Leveraging Language Models for Commonsense Reasoning. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Rajpurkar, P.; Jia, R.; and Liang, P. 2018. Know What You Don’t Know: Unanswerable Questions for SQuAD. Proc. of ACL.
    Google ScholarFindings
  • Rajpurkar, P.; Zhang, J.; Lopyrev, K.; and Liang, P. 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
    Findings
  • Richardson, M.; Burges, C. J.; and Renshaw, E. 2013. MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Sap, M.; Bras, R. L.; Allaway, E.; Rashkin, H.; Bhagavatula, C.; Lourie, N.; Roof, B.; Smith, N.; and Choi, Y. 2019. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. Proc. of AAAI.
    Google ScholarFindings
  • Speer, R.; Chin, J.; and Havasi, C. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In Proc. of AAAI.
    Google ScholarLocate open access versionFindings
  • Sukhbaatar, S.; Szlam, A.; Weston, J.; and Fergus, R. 2015. End-To-End Memory Networks. Proc. of NIPS.
    Google ScholarFindings
  • Talmor, A.; Herzig, J.; Lourie, N.; and Berant, J. 2019. CommonsenseQA: A Question Answering Challenge Targeting
    Google ScholarFindings
  • Commonsense Knowledge. In NAACL-HLT 2019: Annual Conference of the North American Chapter of the Association for Computational Linguistics, 4149–4158.
    Google ScholarLocate open access versionFindings
  • Tandon, N.; Dalvi, B.; Sakaguchi, K.; Clark, P.; and Bosselut, A. 2019. WIQA: A dataset for “What if... ” reasoning over procedural text. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Trinh, T. H.; and Le, Q. V. 2018. A Simple Method for Commonsense Reasoning. ArXiv, abs/1806.02847.
    Findings
  • Trischler, A.; Wang, T.; Yuan, X.; Harris, J.; Sordoni, A.; Bachman, P.; and Suleman, K. 2016. Newsqa: A machine comprehension dataset. arXiv preprint arXiv:1611.09830.
    Findings
  • Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Łukasz Kaiser; and Polosukhin, I. 2017. Attention is all you need. In Proc. of NIPS.
    Google ScholarLocate open access versionFindings
  • Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; and Bengio, Y. 2017. Graph Attention Networks. Proc. of ICLR.
    Google ScholarFindings
  • Wang, L.; Sun, M.; Zhao, W.; Shen, K.; and Liu, J. 2018a. Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational Knowledge for Commonsense Machine Comprehension. In Proc. of SemEval.
    Google ScholarLocate open access versionFindings
  • Wang, S.; Yu, M.; Jiang, J.; and Chang, S. 2018b. A CoMatching Model for Multi-choice Reading Comprehension. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Weston, J.; Bordes, A.; Chopra, S.; Rush, A. M.; van Merrienboer, B.; Joulin, A.; and Mikolov, T. 2016. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Weston, J.; Chopra, S.; and Bordes, A. 2015. Memory Networks. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; and Bengio, Y. 2015.
    Google ScholarFindings
  • Xu, Y.; Fang, M.; Chen, L.; Du, Y.; Zhou, J. T.; and Zhang., C. 2020. Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games. In Proc. of NeurIPS.
    Google ScholarLocate open access versionFindings
  • Ye, Z.-X.; Chen, Q.; Wang, W.; and Ling, Z.-H. 2019.
    Google ScholarFindings
  • Zhang, S.; Zhao, H.; Wu, Y.; Zhang, Z.; Zhou, X.; and Zhou, X. 2020. DCMN+: Dual Co-Matching Network for Multichoice Reading Comprehension. Proc. of AAAI.
    Google ScholarFindings
  • Zhong, W.; Tang, D.; Duan, N.; Zhou, M.; Wang, J.; and Yin, J. 2019. Improving Question Answering by CommonsenseBased Pre-training. In Proc. of NLPCC.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments