GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

ICLR 2021, 2021.

Cited by: 0|Bibtex|Views52
Other Links: arxiv.org
Weibo:
Language model pre-training for table semantic parsing.

Abstract:

We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We construct synthetic question-SQL pairs over high-quality tables via a synchronous context-free grammar (SCFG) induced from existing text-to-SQL datasets....More

Code:

Data:

0
Introduction
  • Tabular data serve as important information source for human decision makers in many domains, such as finance, health care, retail and so on.
  • The authors seek to learn contextual representations jointly from structured tabular data and unstructured natural language sentences, with objectives oriented towards table semantic parsing
Highlights
  • Tabular data serve as important information source for human decision makers in many domains, such as finance, health care, retail and so on
  • We conducted experiments to answer the following two questions: 1) Can grammar-augmented pre-training framework for table semantic parsing (GRAPPA) provide better representations for table semantic parsing tasks? 2) What is the benefit of two pre-training objectives, namely masked language modeling (MLM) and SSP? Since GRAPPA is initialized by RoBERTa, we answer the first question by directly comparing the performance of base parser augmented with GRAPPA and RoBERTa on table semantic parsing tasks
  • We report the performance of GRAPPA trained with MLM, SSP and a variant with both of them (MLM+SSP)
  • The combined objective of MLM+SSP helps GRAPPA achieve better performance when compared with independently using MLM and SSP
  • We proposed a novel and effective pre-training approach for table semantic parsing
  • When augmented with GRAPPA, the model achieves significantly better performance compared with the baselines using BERT and RoBERTa
  • We introduced GRAPPA, which is an language models (LMs) that is pre-trained on the synthetic examples with structured query language (SQL) semantic loss
Methods
  • Semantic parsing data is compositional because utterances are usually related to some formal representations such as logic forms and SQL queries.
  • The augmented examples can be used to teach the model to generalize beyond the given training examples.
  • More and more work (Zhang et al, 2019b; Herzig et al, 2020b; Campagna et al, 2020; Zhong et al, 2020) shows utilizing augmented data doesn’t always result in a significant performance gain in cross-domain semantic parsing end tasks.
  • The most likely reason for this is that models tend to overfit to the canonical input distribution especially the generated utterances are very different compared with the original ones
Results
  • The authors conducted experiments to answer the following two questions: 1) Can GRAPPA provide better representations for table semantic parsing tasks? 2) What is the benefit of two pre-training objectives, namely MLM and SSP? Since GRAPPA is initialized by RoBERTa, the authors answer the first question by directly comparing the performance of base parser augmented with GRAPPA and RoBERTa on table semantic parsing tasks.
  • GRAPPA with MLM+SSP again achieves the best performance when compared with other baselines, obtain the new state-of-the-art results of 84.7% on this task.
  • It is worth noting that the best model here is better than many models trained in the fully-supervised setting in Table 4
  • This suggests that inductive biases injected in pre-trained representation of GRAPPA can significantly help combat the issue of spurious programs introduced by learning from denotations Pasupat & Liang (2015); Wang et al (2019) when gold programs are not available
Conclusion
  • CONCLUSION AND FUTURE WORK

    In this paper, the authors proposed a novel and effective pre-training approach for table semantic parsing.
  • The authors introduced GRAPPA, which is an LM that is pre-trained on the synthetic examples with SQL semantic loss.
  • Results on four semantic parsing tasks demonstrated that GRAPPA significantly outperforms RoBERTa. While the pre-training method is surprisingly effective in its current form, the authors view these results primarily as an invitation for more future work in this direction.
  • This work relies on a hand-crafted grammar which often generates unnatural questions; Further improvements are likely to be made by applying more sophisticated data augmentation techniques.
  • Pre-training might benefit from synthesizing data from a more compositional grammar with a larger logical form coverage, and from supervising by a more compositional semantic signals
Summary
  • Introduction:

    Tabular data serve as important information source for human decision makers in many domains, such as finance, health care, retail and so on.
  • The authors seek to learn contextual representations jointly from structured tabular data and unstructured natural language sentences, with objectives oriented towards table semantic parsing
  • Methods:

    Semantic parsing data is compositional because utterances are usually related to some formal representations such as logic forms and SQL queries.
  • The augmented examples can be used to teach the model to generalize beyond the given training examples.
  • More and more work (Zhang et al, 2019b; Herzig et al, 2020b; Campagna et al, 2020; Zhong et al, 2020) shows utilizing augmented data doesn’t always result in a significant performance gain in cross-domain semantic parsing end tasks.
  • The most likely reason for this is that models tend to overfit to the canonical input distribution especially the generated utterances are very different compared with the original ones
  • Results:

    The authors conducted experiments to answer the following two questions: 1) Can GRAPPA provide better representations for table semantic parsing tasks? 2) What is the benefit of two pre-training objectives, namely MLM and SSP? Since GRAPPA is initialized by RoBERTa, the authors answer the first question by directly comparing the performance of base parser augmented with GRAPPA and RoBERTa on table semantic parsing tasks.
  • GRAPPA with MLM+SSP again achieves the best performance when compared with other baselines, obtain the new state-of-the-art results of 84.7% on this task.
  • It is worth noting that the best model here is better than many models trained in the fully-supervised setting in Table 4
  • This suggests that inductive biases injected in pre-trained representation of GRAPPA can significantly help combat the issue of spurious programs introduced by learning from denotations Pasupat & Liang (2015); Wang et al (2019) when gold programs are not available
  • Conclusion:

    CONCLUSION AND FUTURE WORK

    In this paper, the authors proposed a novel and effective pre-training approach for table semantic parsing.
  • The authors introduced GRAPPA, which is an LM that is pre-trained on the synthetic examples with SQL semantic loss.
  • Results on four semantic parsing tasks demonstrated that GRAPPA significantly outperforms RoBERTa. While the pre-training method is surprisingly effective in its current form, the authors view these results primarily as an invitation for more future work in this direction.
  • This work relies on a hand-crafted grammar which often generates unnatural questions; Further improvements are likely to be made by applying more sophisticated data augmentation techniques.
  • Pre-training might benefit from synthesizing data from a more compositional grammar with a larger logical form coverage, and from supervising by a more compositional semantic signals
Tables
  • Table1: Examples of non-terminals and production rules in our SCFG. Each production rule ROOT → α, β is built from some (x, y) ∈ D by replacing all terminal phrases with non-terminals. ti, ci, and vi stand for any table name, column name, entry value respectively
  • Table2: Overview of four table-based semantic parsing and question answering datasets in fullysupervised (top) and weakly-supervised (bottom) setting used in this paper. More details in Section 3
  • Table3: Performance on SPIDER. We run each
  • Table4: Performance on fully-sup. WIKISQL
  • Table5: Performance on WIKITABLEQUESTIONS. ISQL. We use (<a class="ref-link" id="cWang_et+al_2019_a" href="#rWang_et+al_2019_a">Wang et al, 2019</a>) as our base Results trained on 10% of the data are shown at the model
  • Table6: Performance on weakly-sup. WIK-
  • Table7: Examples of the inputs and annotations for four semantic parsing tasks. SPIDER and Fully-sup. WIKISQL require full annotation of SQL programs, whereas WIKITABLEQUESTIONS and Weakly-sup. WIKISQL only requires annotation of answers (or denotations) of questions
  • Table8: Aggregated datasets for table-and-language tasks
Download tables as Excel
Related work
  • Textual-tabular data understanding Real-world data exist in both structured and unstructured forms. Recently the field has witnessed a surge of interest in joint textual-tabular data understanding problems, such as table semantic parsing (Zhong et al, 2017; Yu et al, 2018b), question answering (Pasupat & Liang, 2015; Chen et al, 2020), retrieval (Zhang et al, 2019a), fact-checking (Chen et al, 2019) and summarization (Parikh et al, 2020; Radev et al, 2020). While most work focus on single tables, often obtained from the Web, some have extended modeling to more complex structures such as relational databases (Finegan-Dollak et al, 2018; Yu et al, 2018b; Wang et al, 2020). All of these tasks can benefit from better representation of the input text and different components of the table, and most importantly, an effective contextualization across the two modalities. Our work aims at obtaining high-quality cross-modal representation via pre-training to potentially benefit all downstream tasks.

    Pre-training for NLP tasks GRAPPA is inspired by recent advances in pre-training for text such as (Devlin et al, 2019; Liu et al, 2019; Lewis et al, 2020b;a; Guu et al, 2020). Seminal work in this area shows that textual representation trained using conditional language modeling objectives significantly improves performance on various downstream NLP tasks. This triggered an exciting line of research work under the themes of (1) cross-modal pre-training that involves text (Lu et al, 2019; Peters et al, 2019; Yin et al, 2020a; Herzig et al, 2020a) and (2) pre-training architectures and objectives catering subsets of NLP tasks (Lewis et al, 2020b;a; Guu et al, 2020). GRAPPA extends these two directions further. The closest work to ours are TaBERT (Yin et al, 2020a) and TAPAS (Herzig et al, 2020a). Both are trained over millions of web tables and relevant but noisy textual context. In comparison, GRAPPA is pre-trained with a novel training objective, over synthetic data plus a much smaller but cleaner collection of text-table datasets.
Funding
  • We add an MLM loss on them as a regularization factor, which requires the model to balance between real and synthetic examples during the pre-training. We note that this consistently improves the performance on all downstream semantic parsing tasks (see Section 4)
  • When augmented with GRAPPA, the model achieves significantly better performance compared with the baselines using BERT and RoBERTa
  • Compared with RoBERTa, our best model with GRAPPA (MLM+SSP) can further improve the performance by 1.8%, leading to a new state-of-the-art performance on this task
  • GRAPPA (MLM) usually improves the performance by around 1% such as 1.4% gain on SPIDER (dev), 0.8% on WIKITABLEQUESTIONS, and 1.2% on weakly-sup
  • By pre-training on the synthetic text-to-SQL examples, GRAPPA (SSP), we can see a similar performance gain on these tasks too except 3.9% improvement on SPIDER dev, which is what we expected (grammar is overfitted to SPIDER)
Study subjects and analysis
high quality datasets: 7
As discussed in Section 2.1, GRAPPA is also pre-trained on human annotated questions over tables with a MLM objective. We collected seven high quality datasets for textual-tabular data understanding (Table 8 in the Appendix), All of them contain Wikipedia tables or databases and the corresponding natural language utterances written by humans. We only use tables and contexts as a pre-training resource and discard all the other human labels such as answers and SQL queries

Reference
  • Rishabh Agarwal, Chen Liang, Dale Schuurmans, and Mohammad Norouzi. Learning to generalize from sparse and underspecified rewards. In ICML, 2019.
    Google ScholarLocate open access versionFindings
  • Jacob Andreas. Good-enough compositional data augmentation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7556–7566, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.676.
    Locate open access versionFindings
  • Yoav Artzi and Luke Zettlemoyer. Weakly supervised learning of semantic parsers for mapping instructions to actions. Transactions of the Association forComputational Linguistics, 2013.
    Google ScholarLocate open access versionFindings
  • Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. ArXiv, abs/1607.06450, 2016.
    Findings
  • Jonathan Berant and Percy Liang. Semantic parsing via paraphrasing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1415–1425, Baltimore, Maryland, June 2014. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Chandra Bhagavatula, Thanapon Noraset, and Doug Downey. Tabel: Entity linking in web tables. In International Semantic Web Conference, 2015.
    Google ScholarLocate open access versionFindings
  • Ben Bogin, Matt Gardner, and Jonathan Berant. Global reasoning over database structures for text-to-sql parsing. ArXiv, abs/1908.11214, 2019.
    Findings
  • Giovanni Campagna, Agata Foryciarz, Mehrad Moradshahi, and Monica S. Lam. Zero-shot transfer learning with synthesized data for multi-domain dialogue state tracking. In Proceedings of 58th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2020.
    Google ScholarLocate open access versionFindings
  • Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, and William Yang Wang. Tabfact: A large-scale dataset for table-based fact verification. arXiv preprint arXiv:1909.02164, 2019.
    Findings
  • Wenhu Chen, Hanwen Zha, Zhiyu Chen, Wenhan Xiong, Hong Wang, and William Wang. Hybridqa: A dataset of multi-hop question answering over tabular and textual data. arXiv preprint arXiv:2004.07347, 2020.
    Findings
  • Donghyun Choi, Myeong Cheol Shin, Eunggyun Kim, and Dong Ryeol Shin. Ryansql: Recursively applying sketch-based slot fillings for complex text-to-sql in cross-domain databases. ArXiv, abs/2004.03125, 2020.
    Findings
  • Pradeep Dasigi, Matt Gardner, Shikhar Murty, Luke S. Zettlemoyer, and Eduard H. Hovy. Iterative search for weakly supervised semantic parsing. In NAACL-HLT, 2019.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, 2019.
    Google ScholarLocate open access versionFindings
  • Li Dong and Mirella Lapata. Coarse-to-fine decoding for neural semantic parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 731–742. Association for Computational Linguistics, 2018. URL http://aclweb.org/anthology/P18-1068.
    Locate open access versionFindings
  • Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan Dhanalakshmi Ramanathan, Sesh Sadasivam, Rui Zhang, and Dragomir Radev. Improving text-to-sql evaluation methodology. In ACL 2018. Association for Computational Linguistics, 2018.
    Google ScholarLocate open access versionFindings
  • Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, and Dongmei Zhang. Towards complex text-to-sql in cross-domain database with intermediate representation. In ACL, 2019.
    Google ScholarLocate open access versionFindings
  • Tong Guo and Huilin Gao. Content enhanced bert-based text-to-sql generation. Technical report, 2019.
    Google ScholarFindings
  • Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. Realm: Retrievalaugmented language model pre-training. ArXiv, abs/2002.08909, 2020.
    Findings
  • Pengcheng He, Yi Mao, Kaushik Chakrabarti, and Weizhu Chen. X-sql: reinforce schema representation with context. ArXiv, abs/1908.08113, 2019.
    Findings
  • Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. ArXiv, abs/1610.02136, 2016.
    Findings
  • Jonathan Herzig and Jonathan Berant. Decoupling structure and lexicon for zero-shot semantic parsing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1619–1629. Association for Computational Linguistics, 2018.
    Google ScholarLocate open access versionFindings
  • Jonathan Herzig, P. Nowak, Thomas Muller, Francesco Piccinno, and Julian Martin Eisenschlos. Tapas: Weakly supervised table parsing via pre-training. In ACL, 2020a.
    Google ScholarLocate open access versionFindings
  • Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Muller, Francesco Piccinno, and Julian Martin Eisenschlos. Tapas: Weakly supervised table parsing via pre-training. arXiv preprint arXiv:2004.02349, 2020b.
    Findings
  • Wonseok Hwang, Jinyeung Yim, Seunghyun Park, and Minjoon Seo. A comprehensive exploration on wikisql with table-aware word contextualization. ArXiv, abs/1902.01069, 2019.
    Findings
  • Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer. Learning a neural semantic parser from user feedback. CoRR, abs/1704.08760, 2017.
    Findings
  • Robin Jia and Percy Liang. Data recombination for neural semantic parsing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
    Findings
  • Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, and Luke Zettlemoyer. Pre-training via paraphrasing. arXiv preprint arXiv:2006.15020, 2020a.
    Findings
  • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880, Online, July 2020b. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.703. URL https://www.aclweb.org/anthology/2020.acl-main.703.
    Locate open access versionFindings
  • Fei Li and HV Jagadish. Constructing an interactive natural language interface for relational databases. VLDB, 2014.
    Google ScholarFindings
  • Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc V. Le, and Ni Lao. Memory augmented policy optimization for program synthesis and semantic parsing. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Xi Victoria Lin, Richard Socher, and Caiming Xiong. Bridging textual and tabular data for crossdomain text-to-sql semantic parsing. In Findings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP Findings 2020, November 16th-20th, 2020, 2020.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke S. Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692, 2019.
    Findings
  • Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alche-Buc, Emily B. Fox, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 13–23, 2019.
    Google ScholarLocate open access versionFindings
  • Qin Lyu, Kaushik Chakrabarti, Shobhit Hathi, Souvik Kundu, Jianwen Zhang, and Zheng Chen. Hybrid ranking network for text-to-sql. Technical Report MSR-TR-2020-7, Microsoft Dynamics 365 AI, March 2020.
    Google ScholarFindings
  • Sewon Min, Danqi Chen, Hannaneh Hajishirzi, and Luke Zettlemoyer. A discrete hard em approach for weakly supervised question answering. In EMNLP, 2019.
    Google ScholarLocate open access versionFindings
  • Ankur P Parikh, Xuezhi Wang, Sebastian Gehrmann, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, and Dipanjan Das. Totto: A controlled table-to-text generation dataset. arXiv preprint arXiv:2004.14373, 2020.
    Findings
  • Panupong Pasupat and Percy Liang. Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pp. 1470–1480, 2015.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In EMNLP, pp. 1532–1543. ACL, 2014.
    Google ScholarLocate open access versionFindings
  • Matthew E. Peters, Mark Neumann, Robert Logan, Roy Schwartz, Vidur Joshi, Sameer Singh, and Noah A. Smith. Knowledge enhanced contextual word representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 43–54, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1005. URL https://www.aclweb.org/anthology/D19-1005.
    Locate open access versionFindings
  • Dragomir Radev, Rui Zhang, Amrit Rau, Abhinand Sivaprasad, Chiachun Hsieh, Nazneen Fatema Rajani, Xiangru Tang, Aadit Vyas, Neha Verma, Pranav Krishna, Yangxiaokang Liu, Nadia Irwanto, Jessica Pan, Faiaz Rahman, Ahmad Zaidi, Murori Mutuma, Yasin Tarabar, Ankit Gupta, Tao Yu, Yi Chern Tan, Xi Victoria Lin, Caiming Xiong, and Richard Socher. Dart: Open-domain structured data record to text generation. arXiv preprint arXiv:2007.02871, 2020.
    Findings
  • Tianze Shi, Kedar Tatwawadi, Kaushik Chakrabarti, Yi Mao, Oleksandr Polozov, and Weizhu Chen. Incsql: Training incremental text-to-sql parsers with non-deterministic oracles. arXiv preprint arXiv:1809.05054, 2018.
    Findings
  • Bailin Wang, Ivan Titov, and Mirella Lapata. Learning semantic parsers from denotations with latent structured alignments and abstract programs. In Proceedings of EMNLP, 2019.
    Google ScholarLocate open access versionFindings
  • Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. RAT-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7567–7578, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.677. URL https://www.aclweb.org/anthology/2020.acl-main.677.
    Locate open access versionFindings
  • Yushi Wang, Jonathan Berant, and Percy Liang. Building a semantic parser overnight. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1332–1342, Beijing, China, July 2015a. Association for Computational Linguistics. doi: 10.3115/v1/P15-1129. URL https://www.aclweb.org/anthology/P15-1129.
    Locate open access versionFindings
  • Yushi Wang, Jonathan Berant, Percy Liang, et al. Building a semantic parser overnight. In ACL (1), pp. 1332–1342, 2015b.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. Huggingface’s transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771, 2019.
    Findings
  • Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Sebastian Riedel. TaBERT: Pretraining for joint understanding of textual and tabular data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8413–8426, Online, July 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.745.
    Locate open access versionFindings
  • Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Sebastian Riedel. Tabert: Pretraining for joint understanding of textual and tabular data. arXiv preprint arXiv:2005.08314, 2020b.
    Findings
  • Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, and Dragomir Radev. Syntaxsqlnet: Syntax tree networks for complex and cross-domain text-to-sql task. In Proceedings of EMNLP. Association for Computational Linguistics, 2018a.
    Google ScholarLocate open access versionFindings
  • Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In EMNLP, 2018b.
    Google ScholarLocate open access versionFindings
  • Luke S. Zettlemoyer and Michael Collins. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. UAI, 2005.
    Google ScholarLocate open access versionFindings
  • Li Zhang, Shuo Zhang, and Krisztian Balog. Table2vec: Neural word and entity embeddings for table population and retrieval. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, pp. 1029–1032, New York, NY, USA, 2019a. ACM.
    Google ScholarLocate open access versionFindings
  • Rui Zhang, Tao Yu, He Yang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, and Dragomir Radev. Editing-based sql query generation for cross-domain context-dependent questions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2019b.
    Google ScholarLocate open access versionFindings
  • Victor Zhong, Caiming Xiong, and Richard Socher. Seq2sql: Generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103, 2017.
    Findings
  • Victor Zhong, M. Lewis, Sida I. Wang, and Luke Zettlemoyer. Grounded adaptation for zero-shot executable semantic parsing. The 2020 Conference on Empirical Methods in Natural Language Processing, 2020.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments