Grounded Adaptation for Zero shot Executable Semantic Parsing

Victor Zhong
Victor Zhong
Mike Lewis
Mike Lewis
Luke Zettlemoyer
Luke Zettlemoyer

EMNLP 2020, pp. 6869-6882, 2020.

Other Links: arxiv.org|academic.microsoft.com
Weibo:
We proposed Grounded Adaptation for Zero-shot Executable Semantic Parsing to adapt an existing semantic parser to new environments by synthesizing cycle-consistent data

Abstract:

We propose Grounded Adaptation for Zeroshot Executable Semantic Parsing (GAZP) to adapt an existing semantic parser to new environments (e.g. new database schemas). GAZP combines a forward semantic parser with a backward utterance generator to synthesize data (e.g. utterances and SQL queries) in the new environment, then selects cycle-con...More

Code:

Data:

0
Introduction
Highlights
  • Semantic parsers (Zelle and Mooney, 1996; Zettlemoyer and Collins, 2005; Liang et al, 2011) build executable meaning representations for a range of tasks such as question-answering (Yih et al, 2014), robotic control (Matuszek et al, 2013), and intelligent tutoring systems (Graesser et al, 2005)
  • We propose Grounded Adaptation for Zero-shot Executable Semantic Parsing, which adapts existing semantic parsers to new environments by synthesizing new, cycle-consistent data
  • We show that cycleconsistency is critical to synthesizing high-quality examples in the new environment, which in turn allows for successful adaptation and performance
  • Our results indicate that adaptation to the new environment significantly outperforms augmentation in the training environment
  • We proposed Grounded Adaptation for Zero-shot Executable Semantic Parsing (GAZP) to adapt an existing semantic parser to new environments by synthesizing cycle-consistent data
  • Our analyses showed that GAZP outperforms data augmentation, performance improvement scales with the amount of GAZP-synthesized data, and cycleconsistency is central to successful adaptation
Methods
  • The authors evaluate performance on the Spider (Yu et al, 2018b), Sparc (Yu et al, 2019b), and CoSQL (Yu et al, 2019a) zero-shot semantic parsing tasks.
  • Table 1 shows dataset statistics.
  • Figure 4 shows examples from each dataset.
  • The authors use preprocessing steps from Zhang et al (2019) to preprocess SQL logical forms.
  • Evaluation consists of exact match over logical form templates (EM) in which values are stripped out, as well as execution accuracy (EX).
  • Official evaluations recently incorporated fuzz-test accuracy (FX) as tighter variant of execution accuracy.
  • Compared to an execution match, a fuzz-test execution match is less likely to be spurious.
  • FX implementation is not public as of writing, the authors only report test FX
Results
  • The authors primarily compare GAZP with the baseline forward semantic parser, because prior systems produce queries without values which are not executable.
  • The authors include one such non-executable model, EditSQL (Zhang et al, 2019), one of the top parsers on Spider at the time of writing, for reference.
  • Appendix A.3 shows examples of synthesized adaptation examples and compares them to real examples
Conclusion
  • Conclusion and Future work

    The authors proposed GAZP to adapt an existing semantic parser to new environments by synthesizing cycle-consistent data.
  • GAZP improved parsing performance on three zero-shot parsing tasks.
  • GAZP applies to any problems that lack annotated data and differ between training and inference environments.
  • One such area is robotics, where one trains in simulation because it is prohibitively expensive to collect annotated trajectories in the real world.
  • The authors will consider how to interpret environment specifications to facilitate grounded adaptation in these other areas
Summary
  • Introduction:

    Semantic parsers (Zelle and Mooney, 1996; Zettlemoyer and Collins, 2005; Liang et al, 2011) build executable meaning representations for a range of tasks such as question-answering (Yih et al, 2014), robotic control (Matuszek et al, 2013), and intelligent tutoring systems (Graesser et al, 2005)
  • They are usually engineered for each application environment.
  • Compared to data-augmentation, which typically synthesizes unverified data in the training environment, GAZP instead synthesizes consistency-verified data in the new environment
  • Methods:

    The authors evaluate performance on the Spider (Yu et al, 2018b), Sparc (Yu et al, 2019b), and CoSQL (Yu et al, 2019a) zero-shot semantic parsing tasks.
  • Table 1 shows dataset statistics.
  • Figure 4 shows examples from each dataset.
  • The authors use preprocessing steps from Zhang et al (2019) to preprocess SQL logical forms.
  • Evaluation consists of exact match over logical form templates (EM) in which values are stripped out, as well as execution accuracy (EX).
  • Official evaluations recently incorporated fuzz-test accuracy (FX) as tighter variant of execution accuracy.
  • Compared to an execution match, a fuzz-test execution match is less likely to be spurious.
  • FX implementation is not public as of writing, the authors only report test FX
  • Results:

    The authors primarily compare GAZP with the baseline forward semantic parser, because prior systems produce queries without values which are not executable.
  • The authors include one such non-executable model, EditSQL (Zhang et al, 2019), one of the top parsers on Spider at the time of writing, for reference.
  • Appendix A.3 shows examples of synthesized adaptation examples and compares them to real examples
  • Conclusion:

    Conclusion and Future work

    The authors proposed GAZP to adapt an existing semantic parser to new environments by synthesizing cycle-consistent data.
  • GAZP improved parsing performance on three zero-shot parsing tasks.
  • GAZP applies to any problems that lack annotated data and differ between training and inference environments.
  • One such area is robotics, where one trains in simulation because it is prohibitively expensive to collect annotated trajectories in the real world.
  • The authors will consider how to interpret environment specifications to facilitate grounded adaptation in these other areas
Tables
  • Table1: Dataset statistics
  • Table2: Development set evaluation results on Spider, Sparc, and CoSQL. EM is exact match accuracy of logical form templates without values. EX is execution accuracy of fully-specified logical forms with values. FX is execution accuracy from fuzz-testing with randomized databases. Baseline is the forward parser without adaptation. EditSQL is a state-of-the-art language-to-SQL parser that produces logical form templates that are not executable
  • Table3: Ablation performance on development sets. For each one, 100,000 examples are synthesized, out of which queries that do not execute or execute to the empty set are discarded. “nocycle” uses adaptation without cycleconsistency. “syntrain” uses data-augmentation on training environments. “EM consistency” enforces logical form instead of execution consistency
  • Table4: Dropout rates for the forward parser
  • Table5: Dropout rates for the backward generator
  • Table6: Examples of synthesized queries
  • Table7: Difficulty breakdown for Spider test set
  • Table8: Difficulty breakdown for Sparc test set
  • Table9: Difficulty breakdown for CoSQL test set
  • Table10: Turn breakdown for Sparc test set
  • Table11: Turn breakdown for CoSQL test set
Download tables as Excel
Related work
  • Semantic parsing. Semantic parsers parse natural language utterances into executable logical forms with respect to an environment (Zelle and Mooney, 1996; Zettlemoyer and Collins, 2005; Liang et al, 2011). In zero-shot semantic parsing, the model is required to generalize to environments (e.g. new domains, new database schemas) not seen during training (Pasupat and Liang, 2015; Zhong et al, 2017; Yu et al, 2018b). For languageto-SQL zero-shot semantic parsing, a variety of methods have been proposed to generalize to new databases by selecting from table schemas in the new database (Zhang et al, 2019; Guo et al, 2019). Our method is complementary to these work — the synthesis, cycle-consistency, and adaptation steps in GAZP can be applied to any parser, so long as we can learn a backward utterance generator and evaluate logical-form equivalence.

    Data augmentation. Data augmentation transforms original training data to synthesize artificial training data. Krizhevsky et al (2017) crop and rotate input images to improve object recognition. Dong et al (2017) and Yu et al (2018a) respectively paraphrase and back-translate (Sennrich et al, 2016; Edunov et al, 2018) questions and documents to improve question-answering. Jia and Liang (2016) perform data-recombination in the training domain to improve semantic parsing. Hannun et al (2014) superimpose noisy background tracks with input tracks to improve speech recognition. Our method is distinct from dataaugmentation in the following ways. First, we synthesize data on logical forms sampled from the new environment instead of the original environment, which allows for adaptation to the new environments. Second, we propose cycle-consistency to prune low-quality data and keep high-quality data for adaptation. Our analyses show that these core differences from data-augmentation are central to improving parsing performance.
Funding
  • Our results indicate that adaptation to the new environment significantly outperforms augmentation in the training environment
Study subjects and analysis
datasets: 3
Figure 4 shows examples from each dataset. For all three datasets, we use preprocessing steps from Zhang et al (2019) to preprocess SQL logical forms. Evaluation consists of exact match over logical form templates (EM) in which values are stripped out, as well as execution accuracy (EX)

datasets: 3
Figure 4 shows examples from each dataset. For all three datasets, we use preprocessing steps from Zhang et al (2019) to preprocess SQL logical forms. Evaluation consists of exact match over logical form templates (EM) in which values are stripped out, as well as execution accuracy (EX)

Reference
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. ICLR.
    Google ScholarLocate open access versionFindings
  • Ruisheng Cao, Su Zhu, Chen Liu, Jieyu Li, and Kai Yu. 2019. Semantic parsing with dual learning. In ACL.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
    Google ScholarFindings
  • Li Dong, Jonathan Mallinson, Siva Reddy, and Mirella Lapata. 2017. Learning to paraphrase for question answering. In EMNLP.
    Google ScholarFindings
  • Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding back-translation at scale. In EMNLP.
    Google ScholarFindings
  • Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, and Trevor Darrell. 2018. Speaker-follower models for vision-and-language navigation. In NeurIPS.
    Google ScholarFindings
  • Arthur C Graesser, Patrick Chipman, Brian C Haynes, and Andrew Olney. 2005. AutoTutor: An intelligent tutoring system with mixed-initiative dialogue. IEEE Transactions on Education.
    Google ScholarLocate open access versionFindings
  • Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, and Dongmei Zhang. 2019. Towards complex text-to-SQL in crossdomain database with intermediate representation. In ACL.
    Google ScholarFindings
  • Awni Y. Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep Speech: Scaling up endto-end speech recognition. CoRR, abs/1412.5567.
    Findings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural computation.
    Google ScholarFindings
  • Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei A. Efros, and Trevor Darrell. 2018. CyCADA: Cycle consistent adversarial domain adaptation. In ICML.
    Google ScholarFindings
  • Robin Jia and Percy Liang. 2016. Data recombination for neural semantic parsing. In ACL.
    Google ScholarFindings
  • Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR.
    Google ScholarFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. Imagenet classification with deep convolutional neural networks. Communications of the ACM.
    Google ScholarFindings
  • Percy Liang, Michael I. Jordan, and Dan Klein. 2011. Learning dependency-based compositional semantics. Computational Linguistics.
    Google ScholarFindings
  • Cynthia Matuszek, Evan Herbst, Luke Zettlemoyer, and Dieter Fox. 2013. Learning to parse natural language commands to a robot control system. Experimental Robotics.
    Google ScholarFindings
  • Panupong Pasupat and Percy Liang. 2015. Compositional semantic parsing on semi-structured tables. In ACL.
    Google ScholarFindings
  • Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020.
    Google ScholarFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In ACL.
    Google ScholarFindings
  • Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. In NIPS.
    Google ScholarFindings
  • Wen-tau Yih, Xiaodong He, and Christopher Meek. 2014. Semantic parsing for single-relation question answering. In ACL.
    Google ScholarFindings
  • Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. 2018a. Qanet: Combining local convolution with global self-attention for reading comprehension. In ICLR.
    Google ScholarFindings
  • Tao Yu, Rui Zhang, He Yang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga, Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vincent Zhang, Caiming Xiong, Richard Socher, Walter Lasecki, and Dragomir Radev. 2019a. Cosql: A conversational text-to-sql challenge towards crossdomain natural language interfaces to databases. In EMNLP.
    Google ScholarFindings
  • Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018b. Spider: A largescale human-labeled dataset for complex and crossdomain semantic parsing and text-to-sql task. In EMNLP.
    Google ScholarFindings
  • Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene Li, Bo Pang, Tao Chen, Emily Ji, Shreya Dixit, David Proctor, Sungrok Shim, Jonathan Kraft, Vincent Zhang, Caiming Xiong, Richard Socher, and Dragomir Radev. 2019b. Sparc: Cross-domain semantic parsing in context. In ACL.
    Google ScholarLocate open access versionFindings
  • John M. Zelle and Raymond J. Mooney. 1996. Learning to parse database queries using inductive logic programming. In AAAI/IAAI.
    Google ScholarFindings
  • Luke S. Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In UAI.
    Google ScholarFindings
  • Rui Zhang, Tao Yu, He Yang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, and Dragomir Radev. 2019. Editing-based sql query generation for cross-domain context-dependent questions. In EMNLP.
    Google ScholarFindings
  • Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103.
    Findings
  • Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments