Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations

EMNLP, pp. 2442-2452, 2018.

Cited by: 15|Views139
EI
Weibo:
7 Conclusion In this paper, we propose a general update equation from semantic parsing from denotation and propose a policy shaping method for addressing the spurious program challenge

Abstract:

Semantic parsing from denotations faces two key challenges in model training: (1) given only the denotations (e.g., answers), search for good candidate semantic parses, and (2) choose the best model update algorithm. We propose effective and general solutions to each of them. Using policy shaping, we bias the search procedure towards sema...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • Semantic parsing from denotations (SpFD) is the problem of mapping text to executable formal representations in a situated environment and executing them to generate denotations, in the absence of access to correct representations.
  • Given the question and a table environment, a semantic parser maps the question to an executable program, in this case a SQL query, and executes the query on the environment to generate the answer England.
  • The existing learning approaches for SpFD perform two steps for every training example, a search step that explores the space of programs
Highlights
  • Semantic parsing from denotations (SpFD) is the problem of mapping text to executable formal representations in a situated environment and executing them to generate denotations, in the absence of access to correct representations
  • Policy shaping in general improves the performance across different algorithms
  • We found mean difference for Margin Avg. Violation Reward is 0.57% where the mean difference for Maximum Margin Reward is 0.9%
  • We found that 53 programs were spurious while using policy shaping this number came down to 23
  • Using Margin Avg. Violation Reward updates, we found that the dev accuracy dropped to 37.1%
  • 7 Conclusion In this paper, we propose a general update equation from semantic parsing from denotation and propose a policy shaping method for addressing the spurious program challenge
Methods
  • The authors describe the setup in §5.1 and results in §5.2. 5.1 Setup

    Dataset The authors use the sequential question answering (SQA) dataset (Iyyer et al, 2017) for the experiments.
  • The authors describe the setup in §5.1 and results in §5.2.
  • 5.1 Setup.
  • Dataset The authors use the sequential question answering (SQA) dataset (Iyyer et al, 2017) for the experiments.
  • The data is partitioned into training (83%) and test (17%) splits.
  • The authors use 4/5 of the original train split as the training set and the remaining 1/5 as the dev set.
  • Previous state-of-theart result on the SQA dataset is 44.7% accuracy, using maximum margin reward learning
Results
  • Policy Gradient vs Off-Policy Gradient REINFORCE, a simple policy gradient method, achieved extremely poor performance.
  • This likely due to the problem of exploration and having to sample from a large space of programs.
  • This is further corroborated from observing the much superior performance of off-policy policy gradient methods.
  • The sampling policy is an important factor to consider for policy gradient methods
Conclusion
  • The authors propose a general update equation from semantic parsing from denotation and propose a policy shaping method for addressing the spurious program challenge.
  • The authors plan to apply the proposed learning framework to more semantic parsing tasks and consider new methods for policy shaping
Summary
  • Introduction:

    Semantic parsing from denotations (SpFD) is the problem of mapping text to executable formal representations in a situated environment and executing them to generate denotations, in the absence of access to correct representations.
  • Given the question and a table environment, a semantic parser maps the question to an executable program, in this case a SQL query, and executes the query on the environment to generate the answer England.
  • The existing learning approaches for SpFD perform two steps for every training example, a search step that explores the space of programs
  • Methods:

    The authors describe the setup in §5.1 and results in §5.2. 5.1 Setup

    Dataset The authors use the sequential question answering (SQA) dataset (Iyyer et al, 2017) for the experiments.
  • The authors describe the setup in §5.1 and results in §5.2.
  • 5.1 Setup.
  • Dataset The authors use the sequential question answering (SQA) dataset (Iyyer et al, 2017) for the experiments.
  • The data is partitioned into training (83%) and test (17%) splits.
  • The authors use 4/5 of the original train split as the training set and the remaining 1/5 as the dev set.
  • Previous state-of-theart result on the SQA dataset is 44.7% accuracy, using maximum margin reward learning
  • Results:

    Policy Gradient vs Off-Policy Gradient REINFORCE, a simple policy gradient method, achieved extremely poor performance.
  • This likely due to the problem of exploration and having to sample from a large space of programs.
  • This is further corroborated from observing the much superior performance of off-policy policy gradient methods.
  • The sampling policy is an important factor to consider for policy gradient methods
  • Conclusion:

    The authors propose a general update equation from semantic parsing from denotation and propose a policy shaping method for addressing the spurious program challenge.
  • The authors plan to apply the proposed learning framework to more semantic parsing tasks and consider new methods for policy shaping
Tables
  • Table1: Parameter updates for various learning algorithms are special cases of Eq (9), with different choices of intensity w and competing distribution q. We do not show dependence upon table t for brevity. For off-policy policy gradient, u is the exploration policy. For margin methods, y⇤ is the reference program (see §4.1), V is the set of programs that violate the margin constraint (cf. Eq (7)) and yis the most violating program (cf. Eq (8)). For
  • Table2: Experimental results on different model update algorithms, with and without policy shaping
  • Table3: The dev set results on the new variations of the update algorithms
  • Table4: Training examples and the highest ranked program in the beam search, scored according to the shaped policy, after training with MAVER. Using policy shaping, we can recover from failures due to spurious programs in the search step for these examples
Download tables as Excel
Related work
  • Semantic Parsing from Denotation Mapping natural language text to formal meaning representation was first studied by Montague (1970). Early work on learning semantic parsers rely on labeled formal representations as the supervision signals (Zettlemoyer and Collins, 2005, 2007; Zelle and Mooney, 1993). However, because getting access to gold formal representation generally requires expensive annotations by an expert, distant supervision approaches, where semantic parsers are learned from denotation only, have become the main learning paradigm (e.g., Clarke et al, 2010; Liang et al, 2011; Artzi and Zettlemoyer, 2013; Berant et al, 2013; Iyyer et al, 2017; Krishnamurthy et al, 2017). Guu et al (2017) studied the problem of spurious programs and considered adding noise to diversify the search procedure and introduced meritocratic updates.

    Reinforcement Learning Algorithms Reinforcement learning algorithms have been applied to various NLP problems including dialogue (Li et al, 2016), text-based games (Narasimhan et al, 2015), information extraction (Narasimhan et al, 2016), coreference resolution (Clark and Man-

    Question “of these teams, which had more than 21 losses?" “of the remaining, which earned the most bronze medals?" “of those competitors from germany, which was not paul sievert?"

    without policy shaping SELECT Club

    WHERE Losses = ROW 15 SELECT Nation WHERE

    Rank = ROW 1 SELECT Name WHERE Time (hand) = ROW 3 with policy shaping SELECT Club

    WHERE Losses > 21 FollowUp WHERE Bronze is Max FollowUp WHERE Name != ROW 5

    ning, 2016), semantic parsing (Guu et al, 2017) and instruction following (Misra et al, 2017). Guu et al (2017) show that policy gradient methods underperform maximum marginal likelihood approaches. Our result on the SQA dataset supports their observation. However, we show that using off-policy sampling, policy gradient methods can provide superior performance to maximum marginal likelihood methods.
Reference
  • Yoav Artzi and Luke Zettlemoyer. 2013. Weakly supervised learning of semantic parsers for mapping instructions to actions. Transactions of the Association of Computational Linguistics, 1:49–62.
    Google ScholarLocate open access versionFindings
  • Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on Freebase from question-answer pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Kevin Clark and D. Christopher Manning. 2016. Deep reinforcement learning for mention-ranking coreference models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • James Clarke, Dan Goldwasser, Ming-Wei Chang, and Dan Roth. 2010. Driving semantic parsing from the world’s response. In Proceedings of the Conference on Computational Natural Language Learning.
    Google ScholarLocate open access versionFindings
  • Hal Daumé III and Daniel Marcu. 200Learning as search optimization: Approximate large margin methods for structured prediction. In Proceedings of the 22nd international conference on Machine learning, pages 169–176. ACM.
    Google ScholarLocate open access versionFindings
  • Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society. Series B (methodological), pages 1–38.
    Google ScholarLocate open access versionFindings
  • Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles Lee Isbell, and Andrea Lockerd Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in Neural Information Processing Systems.
    Google ScholarLocate open access versionFindings
  • Kelvin Guu, Panupong Pasupat, Evan Liu, and Percy Liang. 2017. From language to programs: Bridging reinforcement learning and maximum marginal likelihood. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1051–1062. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Liang Huang, Suphan Fayong, and Yang Guo. 2012. Structured perceptron with inexact search. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 142–151. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mohit Iyyer, Wen-tau Yih, and Ming-Wei Chang. 2017. Search-based neural structured learning for sequential question answering. In Proceedings of the 55th
    Google ScholarLocate open access versionFindings
  • Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1821–1831. Association for Computational Linguistics.
    Google ScholarFindings
  • Robin Jia and Percy Liang. 2016. Data recombination for neural semantic parsing. In Proceedings of the
    Google ScholarLocate open access versionFindings
  • 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
    Google ScholarFindings
  • Jayant Krishnamurthy, Pradeep Dasigi, and Matt Gardner. 2017. Neural semantic parsing with type constraints for semi-structured tables. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1516–1526.
    Google ScholarLocate open access versionFindings
  • Jonathan K. Kummerfeld, Taylor Berg-Kirkpatrick, and Dan Klein. 20An empirical analysis of optimization for max-margin nlp. In EMNLP.
    Google ScholarFindings
  • Kenton Lee, Mike Lewis, and Luke Zettlemoyer. 20Global neural CCG parsing with optimality guarantees. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pages 2366–2376.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016. Deep reinforcement learning for dialogue generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Percy Liang, Michael I Jordan, and Dan Klein. 2011. Learning dependency-based compositional semantics. In Proceedings of the Conference of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ryan T. McDonald, Koby Crammer, and Fernando Pereira. 2005. Online large-margin training of dependency parsers. In ACL.
    Google ScholarFindings
  • Dipendra Misra, John Langford, and Yoav Artzi. 2017. Mapping instructions and visual observations to actions with reinforcement learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. Arxiv preprint: https://arxiv.org/abs/1704.08795.
    Findings
  • Dipendra K. Misra and Yoav Artzi. 2016. Neural shiftreduce CCG semantic parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Kumar Dipendra Misra, Kejia Tao, Percy Liang, and Ashutosh Saxena. 2015. Environment-driven lexicon induction for high-level instructions. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).
    Google ScholarLocate open access versionFindings
  • Richard Montague. 1970. English as a formal language.
    Google ScholarFindings
  • Karthik Narasimhan, Tejas Kulkarni, and Regina Barzilay. 2015. Language understanding for textbased games using deep reinforcement learning. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Karthik Narasimhan, Adam Yala, and Regina Barzilay. 2016. Improving information extraction by acquiring external evidence with reinforcement learning. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Ariadna Quattoni, Sybor Wang, Louis-Philippe Morency, Morency Collins, and Trevor Darrell. 2007. Hidden conditional random fields. IEEE transactions on pattern analysis and machine intelligence, 29(10).
    Google ScholarLocate open access versionFindings
  • Kenneth Rose. 1998. Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of the IEEE, 86(11):2210–2239.
    Google ScholarLocate open access versionFindings
  • Rajhans Samdani, Ming-Wei Chang, and Dan Roth. 2012. Unified expectation maximization. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational
    Google ScholarLocate open access versionFindings
  • Linguistics: Human Language Technologies, pages 688–698. Association for Computational Linguistics. Ben Taskar, Carlos Guestrin, and Daphne Koller. 2003. Max-margin markov networks. In NIPS. Ben Taskar, Dan Klein, Mike Collins, Daphne Koller, and Christopher Manning. 2004. Max-margin parsing. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. 2007. Online large-margin training for statistical machine translation. In EMNLPCoNLL. Ronald J. Williams. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine Learning, 8. Chun-Nam John Yu and Thorsten Joachims. 2009. Learning structural svms with latent variables. In Proceedings of the 26th annual international conference on machine learning, pages 1169–1176. ACM. John M Zelle and Raymond J Mooney. 1993. Learning semantic grammars with constructive inductive logic programming. In AAAI, pages 817–822.
    Google ScholarLocate open access versionFindings
  • Luke S. Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Proceedings of the Conference on Uncertainty in Artificial Intelligence. Luke S. Zettlemoyer and Michael Collins. 2007. Online learning of relaxed CCG grammars for parsing to logical form. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103.
    Findings
Your rating :
0

 

Tags
Comments