Program Enhanced Fact Verification with Verbalization and Graph Attention Network

EMNLP 2020, 2020.

Cited by: 0|Bibtex|Views22
Other Links: arxiv.org
Keywords:
graph attention networkfact verificationaccurate programlanguage understandingDiscovery Accelerator Supplement GrantsMore(3+)
Weibo:
We propose a framework centered around programs and execution to provide symbolic manipulations for table fact verification

Abstract:

Performing fact verification based on structured data is important for many real-life applications and is a challenging research problem, particularly when it involves both symbolic operations and informal inference based on language understanding. In this paper, we present a Program-enhanced Verbalization and Graph Attention Network (P...More

Code:

Data:

0
Introduction
  • With the overwhelming information available on the Internet, fact verification has become crucial for many applications such as detecting fake news, rumors, and political deception (Rashkin et al, 2017; Thorne et al, 2018; Goodrich et al, 2019; Vaibhav et al, 2019; Kryscinski et al, 2019), among others.
  • Table with title ‘Ji-young Oh’ Year.
  • Statement Ji-young Oh played more tournament in 2008 than any other year.
  • Program eq { max { all_rows ; tournaments played } ; hop { filter_eq { all_rows ; year ; 2008 } ; tournaments played } } = True et al, 2018; Yoneda et al, 2018), which is only one type of data where important facts exist.
  • Fig. 1 depicts a simplified example in which systems are expected to decide whether the facts in the table support the natural language statement
Highlights
  • With the overwhelming information available on the Internet, fact verification has become crucial for many applications such as detecting fake news, rumors, and political deception (Rashkin et al, 2017; Thorne et al, 2018; Goodrich et al, 2019; Vaibhav et al, 2019; Kryscinski et al, 2019), among others
  • To effectively enable symbolic operations and integrate them into language-based inference models, we propose a framework centered around programs, i.e., logical forms that can be executed to find evidences from structured data
  • We propose a framework centered around programs and execution to provide symbolic manipulations for table fact verification
  • We propose a verbalization technique together with a graph-based verification network to aggregate and fuse evidences inherently embedded in programs and the original tables for fact verification
  • The experiments show that the proposed model improves the stateof-the-art performance to a 74.4% accuracy on the benchmark dataset TABFACT
  • Our studies reveal the importance of accurate program acquisition for improving the performance of table fact verification
Results
  • Overall Performance Table 2 presents the results of different verification models.
  • The authors' proposed method obtains accuracy of 74.4% on the test set, achieving new state-of-the-art in this dataset.
  • Val Test Test Test Small Test
Conclusion
  • The authors propose a framework centered around programs and execution to provide symbolic manipulations for table fact verification.
  • The experiments show that the proposed model improves the stateof-the-art performance to a 74.4% accuracy on the benchmark dataset TABFACT.
  • The authors' studies reveal the importance of accurate program acquisition for improving the performance of table fact verification.
  • The authors will investigate the properties of the proposed method on verifying statements with more complicated operations and explore the explainability of the model
Summary
  • Introduction:

    With the overwhelming information available on the Internet, fact verification has become crucial for many applications such as detecting fake news, rumors, and political deception (Rashkin et al, 2017; Thorne et al, 2018; Goodrich et al, 2019; Vaibhav et al, 2019; Kryscinski et al, 2019), among others.
  • Table with title ‘Ji-young Oh’ Year.
  • Statement Ji-young Oh played more tournament in 2008 than any other year.
  • Program eq { max { all_rows ; tournaments played } ; hop { filter_eq { all_rows ; year ; 2008 } ; tournaments played } } = True et al, 2018; Yoneda et al, 2018), which is only one type of data where important facts exist.
  • Fig. 1 depicts a simplified example in which systems are expected to decide whether the facts in the table support the natural language statement
  • Results:

    Overall Performance Table 2 presents the results of different verification models.
  • The authors' proposed method obtains accuracy of 74.4% on the test set, achieving new state-of-the-art in this dataset.
  • Val Test Test Test Small Test
  • Conclusion:

    The authors propose a framework centered around programs and execution to provide symbolic manipulations for table fact verification.
  • The experiments show that the proposed model improves the stateof-the-art performance to a 74.4% accuracy on the benchmark dataset TABFACT.
  • The authors' studies reveal the importance of accurate program acquisition for improving the performance of table fact verification.
  • The authors will investigate the properties of the proposed method on verifying statements with more complicated operations and explore the explainability of the model
Tables
  • Table1: Examples of generation templates for different operations
  • Table2: Performance (accuracy) of different models on TABFACT. For Table-BERT baseline, different strategies of linearizing tables to bridge semantic gap with statements. Horizontal and Vertical refer to horizontally or vertically traverse items in tables respectively. S denotes statements, T denotes tables, + indicates concatenation order between S and T. Concatenate refers to directly concatenating items in tables. Template convert items in tables into sentences with pre-defined templates. For LPA baseline, to select one program among all candidates for each statement, they take either a (weighted) voting strategy or a discriminator
  • Table3: Results of different ways of using operations
  • Table4: Ablation results (accuracy) that shows the effectiveness of our graph attention component
  • Table5: Accuracy of different program selection models and corresponding final verification performance based on verbalized evidence derived from each program selection model
  • Table6: Statistics of TABFACT and the split of Train/Val/Test
  • Table7: Details of pre-defined operations
  • Table8: Templates for operations with string or number type executed results
  • Table9: Templates for operations with boolean type executed results
  • Table10: Templates for operations with view or row type executed results
Download tables as Excel
Related work
Funding
  • The first, third, and last author’s research is supported by NSERC Discovery Grants and Discovery Accelerator Supplement Grants (DAS)
Study subjects and analysis
table-statement pairs: 12779
We conduct our experiments on recently released large-scale dataset TABFACT (Chen et al, 2020). TABFACT contains 92,283, 12792, and 12779 table-statement pairs for training, validation and testing respectively. Verification on some statements requires higher-order semantics such as argmax, the test set is further split into a simple and complex subset according to verification difficulty

Reference
  • Yoav Artzi and Luke Zettlemoyer. 2013. Weakly supervised learning of semantic parsers for mapping instructions to actions. Transactions of the Association for Computational Linguistics, 1:49–62.
    Google ScholarLocate open access versionFindings
  • Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1533–1544.
    Google ScholarLocate open access versionFindings
  • Jonathan Berant and Percy Liang. 2014. Semantic parsing via paraphrasing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1415– 1425, Baltimore, Maryland. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, pages 632–642. The Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, and Christopher Potts. 2016. A fast unified model for parsing and sentence understanding. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics.
    Google ScholarLocate open access versionFindings
  • Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017a. Reading Wikipedia to answer opendomain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1870– 1879, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, and Si Wei. 2017b. Neural natural language inference models enhanced with external knowledge. arXiv preprint arXiv:1711.04289.
    Findings
  • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017c. Enhanced
    Google ScholarFindings
  • LSTM for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1657–1668, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017d. Enhanced LSTM for natural language inference. pages 1657–1668.
    Google ScholarFindings
  • Volume 1 (Long Papers), pages 1460–1469, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarFindings
  • Ben Goodrich, Vinay Rao, Peter J Liu, and Mohammad Saleh. 2019. Assessing the factual accuracy of generated text. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 166–175.
    Google ScholarLocate open access versionFindings
  • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017e. Recurrent neural network-based sentence encoder with gated attention for natural language inference. arXiv preprint arXiv:1708.01353.
    Findings
  • Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, and Matt Gardner. 2020. Neural module networks for reasoning over text. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.
    Google ScholarFindings
  • Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, and William Yang Wang. 2020. Tabfact: A large-scale dataset for table-based fact verification. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.
    Google ScholarLocate open access versionFindings
  • Ido Dagan, Bill Dolan, Bernardo Magnini, and Dan Roth. 2010. Recognizing textual entailment: Rational, evaluation and approaches. Journal of Natural Language Engineering, 4.
    Google ScholarLocate open access versionFindings
  • Ido Dagan, Oren Glickman, and Bernardo Magnini. 2005. The pascal recognising textual entailment challenge. In Machine Learning Challenges Workshop, pages 177–190. Springer.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, and Denny Zhou. 2019a. Neural logic machines. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.
    Google ScholarLocate open access versionFindings
  • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019b. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems, pages 13042–13054.
    Google ScholarLocate open access versionFindings
  • Reza Ghaeini, Sadid A. Hasan, Vivek Datla, Joey Liu, Kathy Lee, Ashequl Qadir, Yuan Ling, Aaditya Prakash, Xiaoli Fern, and Oladimeji Farri. 2018. DR-BiLSTM: Dependent reading bidirectional LSTM for natural language inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Kelvin Guu, Panupong Pasupat, Evan Liu, and Percy Liang. 2017. From language to programs: Bridging reinforcement learning and maximum marginal likelihood. pages 1051–1062.
    Google ScholarLocate open access versionFindings
  • Andreas Hanselowski, Hao Zhang, Zile Li, Daniil Sorokin, Benjamin Schiller, Claudia Schulz, and Iryna Gurevych. 2018. UKP-athene: Multi-sentence textual entailment for claim verification. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pages 103–108, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Robin Jia and Percy Liang. 2016. Data recombination for neural semantic parsing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12–22, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Wojciech Kryscinski, Bryan McCann, Caiming Xiong, and Richard Socher. 2019. Evaluating the factual consistency of abstractive text summarization. arXiv, pages arXiv–1910.
    Google ScholarFindings
  • Nate Kushman, Yoav Artzi, Luke Zettlemoyer, and Regina Barzilay. 2014. Learning to automatically solve algebra word problems. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 271–281, Baltimore, Maryland. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, and Ni Lao. 2017. Neural symbolic machines: Learning semantic parsers on Freebase with weak supervision. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 23–33, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 3728–3738. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Zhenghao Liu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2020. Fine-grained fact verification with kernel graph attention network. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7342–7351, Online. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bill MacCartney and Christopher D Manning. 2008. Modeling semantic containment and exclusion in natural language inference. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 521–528.
    Google ScholarLocate open access versionFindings
  • Bill MacCartney and Christopher D Manning. 2009. Natural language inference. Citeseer.
    Google ScholarLocate open access versionFindings
  • Arvind Neelakantan, Quoc V. Le, and Ilya Sutskever. 2016. Neural programmer: Inducing latent programs with gradient descent. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • Feng Nie, Yunbo Cao, Jinpeng Wang, Chin-Yew Lin, and Rong Pan. 2018. Mention and entity description co-attention for entity disambiguation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 5908–5915. AAAI Press.
    Google ScholarLocate open access versionFindings
  • Yixin Nie, Haonan Chen, and Mohit Bansal. 2019. Combining fact extraction and verification with neural semantic matching networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6859–6866.
    Google ScholarLocate open access versionFindings
  • Ankur Parikh, Oscar Tackstrom, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2249–2255, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • A. Radford. 2018. Improving language understanding by generative pre-training.
    Google ScholarFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog, 1(8):9.
    Google ScholarLocate open access versionFindings
  • Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. 2017. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2931–2937.
    Google ScholarLocate open access versionFindings
  • Shuming Shi, Yuehui Wang, Chin-Yew Lin, Xiaojiang Liu, and Yong Rui. 2015. Automatically solving number word problems by semantic parsing and reasoning. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1132–1142, Lisbon, Portugal.
    Google ScholarLocate open access versionFindings
  • James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: a large-scale dataset for fact extraction and verification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), pages 809–819. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos, and Arpit Mittal. 2019. The fever2. 0 shared task. In Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER), pages 1–6.
    Google ScholarLocate open access versionFindings
  • Vaibhav Vaibhav, Raghuram Mandyam Annasamy, and Eduard H. Hovy. 2019. Do sentence interactions matter? leveraging sentence level representations for fake news classification. In Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing, TextGraphs@EMNLP 2019, Hong Kong, November 4, 2019, pages 134– 139. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks.
    Google ScholarFindings
  • Xiaoyu Yang, Xiaodan Zhu, Huasha Zhao, Qiong Zhang, and Yufei Feng. 2019. Enhancing unsupervised pretraining with external knowledge for natural language inference. In Canadian Conference on Artificial Intelligence, pages 413–419. Springer.
    Google ScholarLocate open access versionFindings
  • Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 440–450, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Wenpeng Yin and Dan Roth. 2018. TwoWingOS: A two-wing optimization strategy for evidential claim verification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 105–114, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Takuma Yoneda, Jeff Mitchell, Johannes Welbl, Pontus Stenetorp, and Sebastian Riedel. 2018. Ucl machine reading group: Four factor framework for fact finding (hexaf). In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pages 97–102.
    Google ScholarLocate open access versionFindings
  • Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. 2018. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887.
    Findings
  • Luke S. Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In UAI ’05, Proceedings of the 21st Conference in Uncertainty in Artificial Intelligence, Edinburgh, Scotland, July 26-29, 2005, pages 658–666. AUAI Press.
    Google ScholarLocate open access versionFindings
  • Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103.
    Findings
  • Wanjun Zhong, Duyu Tang, Zhangyin Feng, Nan Duan, Ming Zhou, Ming Gong, Linjun Shou, Daxin Jiang, Jiahai Wang, and Jian Yin. 2020. LogicalFactChecker: Leveraging logical operations for fact checking with graph module network. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6053–6065, Online. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jie Zhou, Xu Han, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2019. GEAR: Graph-based evidence aggregating and reasoning for fact verification. In Proceedings of the
    Google ScholarLocate open access versionFindings
  • 57th Annual Meeting of the Association for Computational Linguistics, pages 892–901, Florence, Italy. Association for Computational Linguistics.
    Google ScholarFindings
  • A.1 Statistics of TABFACT Dataset Table 6 provides the statistics of TABFACT (Chen et al., 2020), a recent large-scale table-based fact verification dataset on which we evaluate our method. Each evidence table comes along with 2 to 20 statements, and consists of 14 rows and 5-6 rows in average.
    Google ScholarFindings
  • Programs consists of operations, and the definition of operations are listed in Table 7, mainly following (Chen et al., 2020).
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments