REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement
Weibo:
Abstract:
When answering a question, people often draw upon their rich world knowledge in addition to the particular context. While recent works retrieve supporting facts/evidence from commonsense knowledge bases to supply additional information to each question, there is still ample opportunity to advance it on the quality of the evidence. It is...More
Code:
Data:
Introduction
- Commonsense question answering is recently an attractive field in that it requires systems to understand the common sense information beyond words, which are normal to human beings but nontrivial for machines.
- Recent popular solution resorts to external supporting facts from such knowledge bases as evidence, to enhance the question with commonsense knowledge or the logic of reasoning (Devlin et al 2019; Liu et al 2019; Lv et al 2020; Lin et al 2019; Xu et al 2020).
- There is need for models that will further the processing of the evidence
Highlights
- Commonsense question answering is recently an attractive field in that it requires systems to understand the common sense information beyond words, which are normal to human beings but nontrivial for machines
- We propose a model named recursive erasure memory network (REM-Net) towards evidence refinement according to the commonsense question, which improves the explainability of the supporting facts
- The REM-Net is compared with three groups of
- We propose a recursive erasure memory network (REM-Net) that refines evidence for commonsense question
- The recursive procedure leads to repeated use of high-quality supporting facts, so that the question answering is conducted by useful information
- Experimental results demonstrates that REM-Net is effective for the commonsense QA tasks, and the evidence refinement is interpretable
Methods
- Compared Methods The authors compare the performance of
REM-Net with several groups of competitive methods.
Group 1: Baselines. - Compared Methods The authors compare the performance of.
- REM-Net with several groups of competitive methods.
- Group 1: Baselines.
- For WIQA, Majority predicts the most frequent answer option in the training set.
- Polarity predicts answers with the most comparative words.
- Adaboost Dev. Commonsense-Rc (2018a) GPT-FT (2018) DMCN (2020)
Results
- The experimental results are presented in Table 1 and Table 2.
- In the CosmosQA dataset, the REM-Net outperforms all of the compared methods.
- REM-Net (RoBERTaLARGE) is mainly inferior in the “in-para” and “out-of-para” data type, but surpasses compared methods in the “no-effect” data type.
- This is because the majority of the “in-para” and “out-of-para” evidence is meaningful to the question, and the erasure operation from the REM module provides limited effect
Conclusion
- The authors propose a recursive erasure memory network (REM-Net) that refines evidence for commonsense question.
- It recursively estimates quality of each supporting fact based on the question, and refines the supporting fact set .
- Experimental results demonstrates that REM-Net is effective for the commonsense QA tasks, and the evidence refinement is interpretable.
- The authors evaluate the quality of generated evidence compared to retrieved evidence, learning that using generated evidence gives better performance
Summary
Introduction:
Commonsense question answering is recently an attractive field in that it requires systems to understand the common sense information beyond words, which are normal to human beings but nontrivial for machines.- Recent popular solution resorts to external supporting facts from such knowledge bases as evidence, to enhance the question with commonsense knowledge or the logic of reasoning (Devlin et al 2019; Liu et al 2019; Lv et al 2020; Lin et al 2019; Xu et al 2020).
- There is need for models that will further the processing of the evidence
Methods:
Compared Methods The authors compare the performance of
REM-Net with several groups of competitive methods.
Group 1: Baselines.- Compared Methods The authors compare the performance of.
- REM-Net with several groups of competitive methods.
- Group 1: Baselines.
- For WIQA, Majority predicts the most frequent answer option in the training set.
- Polarity predicts answers with the most comparative words.
- Adaboost Dev. Commonsense-Rc (2018a) GPT-FT (2018) DMCN (2020)
Results:
The experimental results are presented in Table 1 and Table 2.- In the CosmosQA dataset, the REM-Net outperforms all of the compared methods.
- REM-Net (RoBERTaLARGE) is mainly inferior in the “in-para” and “out-of-para” data type, but surpasses compared methods in the “no-effect” data type.
- This is because the majority of the “in-para” and “out-of-para” evidence is meaningful to the question, and the erasure operation from the REM module provides limited effect
Conclusion:
The authors propose a recursive erasure memory network (REM-Net) that refines evidence for commonsense question.- It recursively estimates quality of each supporting fact based on the question, and refines the supporting fact set .
- Experimental results demonstrates that REM-Net is effective for the commonsense QA tasks, and the evidence refinement is interpretable.
- The authors evaluate the quality of generated evidence compared to retrieved evidence, learning that using generated evidence gives better performance
Tables
- Table1: Results (accuracy%) on the WIQA test set, including accuracies on three separate question types (In=“inpara”, Out=“out-of-para”, No=“no-effect”), and the overall test set. The baselines labeled with ∗ are taken from <a class="ref-link" id="cTandon_et+al_2019_a" href="#rTandon_et+al_2019_a">Tandon et al (2019</a>), in which the used test set is slightly different
- Table2: Results (accuracy%) on the CosmosQA development set
- Table3: Ablation studies on REM-Net (BERTBASE) that are conducted on WIQA. E signifies the erasure manipulation, while R indicates to the recursive mechanism. In=“In-para” type, Out=“Out-of-para” type, No=“No-effect” type
- Table4: Ablation studies on REM-Net (BERTLARGE) that are conducted on CosmosQA. E denotes the erasure manipulation, while R refers to the recursive mechanism
Related work
- Commonsense Question Answering Similar to opendomain question answering tasks (Rajpurkar, Jia, and Liang 2018; Kwiatkowski et al 2019), commonsense question answering (Tandon et al 2019; Huang et al 2019) requires open-domain information to support the answer prediction. But different from open-domain question answering tasks that the text comprehension is straightforward and the retrieved open-domain information is direct to the questions, in commonsense question answering tasks the open-domain information is more complicated in that they play a role as evidence to bridge the understanding gap in the commonsense questions. Current works leverage the open-domain information by whether incorporating external knowledge as evidence or training the models to generate evidence. Lv et al (2020) extracts knowledge from ConceptNet (Speer, Chin, and Havasi 2017) and Wikipedia, and learns features with GCN (Kipf and Welling 2016) and graph attention (Velickovicet al. 2017). Zhong et al (2019) retrieves ConceptNet (Speer, Chin, and Havasi 2017) triplets and train two functions to measure direct and indirect connections between concepts. Rajani et al (2019) train a GPT (Zhong et al 2019) to generate reasonable evidence for the questions. During evaluation, the model generates evidence and predicts the multi-choice answers concurrently. Ye et al (2019) automatically constructs a commonsense multi-choice dataset from ConceptNet triplets. However, the retrieved or generated evidence are usually not further refined, and some of them could be unnecessary or even confounding to answering the questions. The proposed model explores to refine the original evidence to discover those most supporting evidence to the commonsense questions and therefore provides stronger interpretations.
Funding
- This work was supported in part by National Natural Science Foundation of China (NSFC) under Grant No.U19A2073 and No.61976233, Guangdong Province Basic and Applied Basic Research (Regional Joint Fund-Key) Grant No.2019B1515120039, Nature Science Foundation of Shenzhen Under Grant No 2019191361, Zhijiang Lab’s Open Fund (No 2020AA3AB14) and CSIG Young Fellow Support Fund
Study subjects and analysis
cases: 3
The REM-Net refines the evidence in a multi-hop manner, and the performance gap between different evidence are small, but generated evidence still gives better result. We show three cases to see the qualify of refined evidence, as presented in Figure 6. Figure 6 (1) shows a successful case in WIQA
Reference
- Bordes, A.; Usunier, N.; Chopra, S.; and Weston, J. 2015. Large-scale Simple Question Answering with Memory Networks. ArXiv, abs/1506.02075.
- Bosselut, A.; Rashkin, H.; Sap, M.; Malaviya, C.; Celikyilmaz, A.; and Choi, Y. 2019. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. In Proc. of ACL.
- Cao, Y.; Fang, M.; and Tao, D. 2019. Bag: Bi-directional attention entity graph convolutional network for multi-hop reasoning question answering. In Proc. of NAACL.
- Chen, D.; Bolton, J.; and Manning, C. D. 2016. A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task. In Proc. of ACL.
- Dai, Z.; Dai, W.; Liu, Z.; Rao, F.; Chen, H.; Zhang, G.; Ding, Y.; and Liu, J. 2019. Multi-Task Multi-Head Attention Memory Network for Fine-Grained Sentiment Analysis. In Proc. of NLPCC.
- Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of NAACL.
- Dhingra, B.; Liu, H.; Yang, Z.; Cohen, W. W.; and Salakhutdinov, R. 201Gated-Attention Readers for Text Comprehension. In Proc. of ACL.
- Freund, Y.; and Schapire, R. E. 1995. A desicion-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory, 23–37. Springer.
- Huang, L.; Bras, R. L.; Bhagavatula, C.; and Choi, Y. 201Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning. In Proc. of EMNLP.
- Kingma, D. P.; and Ba, J. L. 2015. Adam: A Method for Stochastic Optimization. In Proc. of ICLR.
- Kipf, T. N.; and Welling, M. 2016. Semi-Supervised Classification with Graph Convolutional Networks. ArXiv, abs/1609.02907.
- Kwiatkowski, T.; Palomaki, J.; Redfield, O.; Collins, M.; Parikh, A. P.; Alberti, C.; Epstein, D.; Polosukhin, I.; Devlin, J.; Lee, K.; Toutanova, K.; Jones, L.; Kelcey, M.; Chang, M.W.; Dai, A. M.; Uszkoreit, J.; Le, Q.; and Petrov, S. 2019. Natural Questions: A Benchmark for Question Answering Research. Proc. of ACL.
- Lai, G.; Xie, Q.; Liu, H.; Yang, Y.; and Hovy, E. H. 2017. RACE: Large-scale ReAding Comprehension Dataset From Examinations. In Proc. of EMNLP.
- Lin, B. Y.; Chen, X.; Chen, J.; and Ren, X. 2019. KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning. In Proc. of EMNLP.
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; and Stoyanov, V. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv, abs/1907.11692.
- Lv, S.; Guo, D.; Xu, J.; Tang, D.; Duan, N.; Gong, M.; Shou, L.; Jiang, D.; Cao, G.; and Hu, S. 2020. Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering. Proc. of AAAI.
- Miller, A. H.; Fisch, A.; Dodge, J.; Karimi, A.-H.; Bordes, A.; and Weston, J. 2016. Key-Value Memory Networks for Directly Reading Documents. In Proc. of EMNLP.
- Mitchell, T.; Cohen, W.; Hruschka, E.; Talukdar, P.; Betteridge, J.; Carlson, A.; Dalvi, B.; Gardner, M.; Kisiel, B.; Krishnamurthy, J.; Lao, N.; Mazaitis, K.; Mohamed, T.; Nakashole, N.; Platanios, E.; Ritter, A.; Samadi, M.; Settles, B.; Wang, R.; Wijaya, D.; Gupta, A.; Chen, X.; Saparov, A.; Greaves, M.; and Welling, J. 2015. Never-Ending Learning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15).
- Parikh, A. P.; Tackstrom, O.; Das, D.; and Uszkoreit, J. 2016. A Decomposable Attention Model for Natural Language Inference. In Proc. of EMNLP.
- Radford, A.; Narasimhan, K.; Salimans, T.; and Sutskever, I. 2018. Improving language understanding with unsupervised learning. Technical report, OpenAI.
- Rajani, N. F.; McCann, B.; Xiong, C.; and Socher, R. 2019. Explain Yourself! Leveraging Language Models for Commonsense Reasoning. In Proc. of ACL.
- Rajpurkar, P.; Jia, R.; and Liang, P. 2018. Know What You Don’t Know: Unanswerable Questions for SQuAD. Proc. of ACL.
- Rajpurkar, P.; Zhang, J.; Lopyrev, K.; and Liang, P. 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
- Richardson, M.; Burges, C. J.; and Renshaw, E. 2013. MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In Proc. of EMNLP.
- Sap, M.; Bras, R. L.; Allaway, E.; Rashkin, H.; Bhagavatula, C.; Lourie, N.; Roof, B.; Smith, N.; and Choi, Y. 2019. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. Proc. of AAAI.
- Speer, R.; Chin, J.; and Havasi, C. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In Proc. of AAAI.
- Sukhbaatar, S.; Szlam, A.; Weston, J.; and Fergus, R. 2015. End-To-End Memory Networks. Proc. of NIPS.
- Talmor, A.; Herzig, J.; Lourie, N.; and Berant, J. 2019. CommonsenseQA: A Question Answering Challenge Targeting
- Commonsense Knowledge. In NAACL-HLT 2019: Annual Conference of the North American Chapter of the Association for Computational Linguistics, 4149–4158.
- Tandon, N.; Dalvi, B.; Sakaguchi, K.; Clark, P.; and Bosselut, A. 2019. WIQA: A dataset for “What if... ” reasoning over procedural text. In Proc. of EMNLP.
- Trinh, T. H.; and Le, Q. V. 2018. A Simple Method for Commonsense Reasoning. ArXiv, abs/1806.02847.
- Trischler, A.; Wang, T.; Yuan, X.; Harris, J.; Sordoni, A.; Bachman, P.; and Suleman, K. 2016. Newsqa: A machine comprehension dataset. arXiv preprint arXiv:1611.09830.
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Łukasz Kaiser; and Polosukhin, I. 2017. Attention is all you need. In Proc. of NIPS.
- Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; and Bengio, Y. 2017. Graph Attention Networks. Proc. of ICLR.
- Wang, L.; Sun, M.; Zhao, W.; Shen, K.; and Liu, J. 2018a. Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational Knowledge for Commonsense Machine Comprehension. In Proc. of SemEval.
- Wang, S.; Yu, M.; Jiang, J.; and Chang, S. 2018b. A CoMatching Model for Multi-choice Reading Comprehension. In Proc. of ACL.
- Weston, J.; Bordes, A.; Chopra, S.; Rush, A. M.; van Merrienboer, B.; Joulin, A.; and Mikolov, T. 2016. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In Proc. of ICLR.
- Weston, J.; Chopra, S.; and Bordes, A. 2015. Memory Networks. In Proc. of ICLR.
- Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; and Bengio, Y. 2015.
- Xu, Y.; Fang, M.; Chen, L.; Du, Y.; Zhou, J. T.; and Zhang., C. 2020. Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games. In Proc. of NeurIPS.
- Ye, Z.-X.; Chen, Q.; Wang, W.; and Ling, Z.-H. 2019.
- Zhang, S.; Zhao, H.; Wu, Y.; Zhang, Z.; Zhou, X.; and Zhou, X. 2020. DCMN+: Dual Co-Matching Network for Multichoice Reading Comprehension. Proc. of AAAI.
- Zhong, W.; Tang, D.; Duan, N.; Zhou, M.; Wang, J.; and Yin, J. 2019. Improving Question Answering by CommonsenseBased Pre-training. In Proc. of NLPCC.
Full Text
Tags
Comments