AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
View the video

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We explore building an interactive semantic parser that continually improves itself from end user interaction, without involving annotators or developers

An Imitation Game for Learning Semantic Parsers from User Interaction

EMNLP 2020, pp.6883-6902, (2020)

Cited by: 4|Views290
Full Text
Bibtex
Weibo

Abstract

Despite the widely successful applications, bootstrapping and fine-tuning semantic parsers are still a tedious process with challenges such as costly data annotation and privacy risks. In this paper, we suggest an alternative, human-in-the-loop methodology for learning semantic parsers directly from users. A semantic parser should be intr...More

Code:

Data:

0
Introduction
  • Semantic parsing has found tremendous applications in building natural language interfaces that allow users to query data and invoke services without programming (Woods, 1973; Zettlemoyer and Collins, 2005; Berant et al, 2013; Su et al, 2017; Yu et al, 2018).
  • The lifecycle of a semantic parser typically consists of two stages: (1) bootstraping, where the authors keep collecting labeled data via trained annotators and/or crowdsourcing for model training until it reaches commercial-grade performance (e.g., 95% accuracy on a surrogate test set), and (2) fine-tuning, where the authors deploy the system, analyze the usage, and collect and annotate new data to address the identified problems or emerging needs.
  • User Interaction No. Player.
  • 25 Aleksandar Radojević Serbia Barton CC (KS) Position Center Jalen Rose United States Michigan
Highlights
  • Semantic parsing has found tremendous applications in building natural language interfaces that allow users to query data and invoke services without programming (Woods, 1973; Zettlemoyer and Collins, 2005; Berant et al, 2013; Su et al, 2017; Yu et al, 2018)
  • Our work extends interactive semantic parsing, a recent idea that leverages system-user interactions to improve semantic parsing on the fly (Li and Jagadish, 2014; He et al, 2016; Chaurasia and Mooney, 2017; Su et al, 2018; Gur et al, 2018; Yao et al, 2019a,b)
  • We evaluate each system by answering the two research questions (RQs):
  • RQ1: Can the system learn a semantic parser without requiring a large amount of annotations?
  • For Full Expert, this number is equal to the trajectory length of the gold query (e.g., 5 for the query in Figure 1); for MISP-L and MISP-L*, it is the number of user interactions during training
  • On the WikiSQL (Zhong et al, 2017) dataset, compared with the full annotation baseline, we show that, when bootstrapped using only 10% of the training data, our method can achieve almost the same test accuracy (2% absolute loss) while using less than 10% of the annotations, without even taking into account the different unit cost of annotation from users vs. domain experts
  • We explore building an interactive semantic parser that continually improves itself from end user interaction, without involving annotators or developers
Results
  • For RQ1, the authors measure the number of user/expert annotations a system requires to train a parser.
  • For Binary User(+Expert), it is hard to quantify “one annotation”, which varies according to the actual database size and the query difficulty.
  • The authors approximate this number by calculating it in the same way as Full Expert, with the assumption that in general validating an answer is as hard as validating the SQL query itself.
  • Note that while the authors do not differentiate the actual cost of users and experts in this aspect, the authors emphasize that the system enjoys an additional benefit of collecting training examples from a much cheaper and more abundant source while serving end users’ needs at the same time
Conclusion
  • The authors explore building an interactive semantic parser that continually improves itself from end user interaction, without involving annotators or developers.
  • One important future work is to conduct a large-scale user study and collect interactions from real users.
  • This is not trivial and has to account for uncertainties such as noisy user feedback.
  • By analyzing real users’ statistics, the authors believe a more accurate and realistic
Summary
  • Introduction:

    Semantic parsing has found tremendous applications in building natural language interfaces that allow users to query data and invoke services without programming (Woods, 1973; Zettlemoyer and Collins, 2005; Berant et al, 2013; Su et al, 2017; Yu et al, 2018).
  • The lifecycle of a semantic parser typically consists of two stages: (1) bootstraping, where the authors keep collecting labeled data via trained annotators and/or crowdsourcing for model training until it reaches commercial-grade performance (e.g., 95% accuracy on a surrogate test set), and (2) fine-tuning, where the authors deploy the system, analyze the usage, and collect and annotate new data to address the identified problems or emerging needs.
  • User Interaction No. Player.
  • 25 Aleksandar Radojević Serbia Barton CC (KS) Position Center Jalen Rose United States Michigan
  • Results:

    For RQ1, the authors measure the number of user/expert annotations a system requires to train a parser.
  • For Binary User(+Expert), it is hard to quantify “one annotation”, which varies according to the actual database size and the query difficulty.
  • The authors approximate this number by calculating it in the same way as Full Expert, with the assumption that in general validating an answer is as hard as validating the SQL query itself.
  • Note that while the authors do not differentiate the actual cost of users and experts in this aspect, the authors emphasize that the system enjoys an additional benefit of collecting training examples from a much cheaper and more abundant source while serving end users’ needs at the same time
  • Conclusion:

    The authors explore building an interactive semantic parser that continually improves itself from end user interaction, without involving annotators or developers.
  • One important future work is to conduct a large-scale user study and collect interactions from real users.
  • This is not trivial and has to account for uncertainties such as noisy user feedback.
  • By analyzing real users’ statistics, the authors believe a more accurate and realistic
Related work
  • Interactive Semantic Parsing. Our work extends interactive semantic parsing, a recent idea that leverages system-user interactions to improve semantic parsing on the fly (Li and Jagadish, 2014; He et al, 2016; Chaurasia and Mooney, 2017; Su et al, 2018; Gur et al, 2018; Yao et al, 2019a,b). As an example, Gur et al (2018) built a neural model to identify and correct error spans in a generated SQL query via dialogues. Yao et al (2019b) further generalized the interaction framework by formalizing a model-based intelligent agent called MISP. Our system leverages MISP to support interactivity but focuses on developing an algorithm for continually improving the base parser from end user interactions, which has not been accomplished by previous work.

    Feedback-based Interactive Learning. Learning interactively from user feedback has been studied for machine translation (Nguyen et al, 2017; Petrushkov et al, 2018; Kreutzer and Riezler, 2019) and other NLP tasks (Sokolov et al, 2016; Gao et al, 2018; Hancock et al., 2019). Most relevant to this work, Hancock et al (2019) constructed a chatbot that learns to request feedback when the user is unsatisfied with the system response, and then further improves itself periodically from the satisfied responses and feedback responses. The work reaffirms the necessity of human-in-the-loop autonomous learning systems like ours.
Funding
  • This research was sponsored in part by the Army Research Office under cooperative agreements W911NF-17-1-0412, NSF Grant IIS1815674, Fujitsu gift grant, and Ohio Supercomputer Center (Center, 1987)
Study subjects and analysis
cases: 3
where D is the aggregated training data over i iterations and wt denotes the weight of (st, at). We consider assigning weight wt in three cases: (1) For confident actions at, we set wt = 1. This essentially treats confident actions as gold decisions, which resembles self-training (Nigam and Ghani, 2000)

pairs: 56355
Bounding the two terms gives the following theorem: We test our system on the WikiSQL dataset (Zhong et al, 2017). The dataset contains a large scale of annotated question-SQL pairs (56,355 pairs for training) and thus serves as a good resource for experimenting iterative learning. For the base semantic parser, we choose SQLova (Hwang et al, 2019), one of the top-performing models on Wik-Theorem 5.2

question-SQL query pairs: 7377
For the base semantic parser, we choose EditSQL (Zhang et al, 2019), one of the open-sourced top models on Spider. Given the small size of Spider (7,377 question-SQL query pairs for training after data cleaning), we only experiment with one initialization setting, using 10% of the training set. Since all Spider models do not predict the specific values in a SQL query (e.g., “jalen rose” in Figure 1),5 we cannot execute the generated query to simulate the binary execution feedback

Reference
  • Kazuoki Azuma. 1967. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, Second Series, 19(3):357–367.
    Google ScholarLocate open access versionFindings
  • Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1533–1544.
    Google ScholarLocate open access versionFindings
  • Ohio Supercomputer Center. 1987. Ohio supercomputer center. http://osc.edu/ark:/19495/f5s1ph73.
    Locate open access versionFindings
  • Shobhit Chaurasia and Raymond J Mooney. 2017. Dialog for language to code. In Proceedings of the Eighth International Joint Conference on Natural
    Google ScholarLocate open access versionFindings
  • Sonia Chernova and Manuela Veloso. 2009. Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34:1–25.
    Google ScholarLocate open access versionFindings
  • James Clarke, Dan Goldwasser, Ming-Wei Chang, and Dan Roth. 2010. Driving semantic parsing from the world’s response. In Proceedings of the fourteenth conference on computational natural language learning, pages 18–27. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hal Daume, John Langford, and Daniel Marcu. 2009. Search-based structured prediction. Machine learning, 75(3):297–325.
    Google ScholarLocate open access versionFindings
  • Long Duong, Hadi Afshar, Dominique Estival, Glen Pink, Philip R Cohen, and Mark Johnson. 201Active learning for deep semantic parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 43–48.
    Google ScholarLocate open access versionFindings
  • Meng Fang, Yuan Li, and Trevor Cohn. 2017. Learning how to active learn: A deep reinforcement learning approach. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 595–605.
    Google ScholarLocate open access versionFindings
  • Yang Gao, Christian M Meyer, and Iryna Gurevych. 2018. APRIL: Interactively learning to summarise by combining active preference learning and reinforcement learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4120–4130.
    Google ScholarLocate open access versionFindings
  • Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. 2017. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1321–1330. JMLR. org.
    Google ScholarLocate open access versionFindings
  • Izzeddin Gur, Semih Yavuz, Yu Su, and Xifeng Yan. 2018. DialSQL: Dialogue based structured query generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1339–1349.
    Google ScholarLocate open access versionFindings
  • Kelvin Guu, Panupong Pasupat, Evan Liu, and Percy Liang. 2017. From language to programs: Bridging reinforcement learning and maximum marginal likelihood. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1051–1062.
    Google ScholarLocate open access versionFindings
  • Braden Hancock, Antoine Bordes, Pierre-Emmanuel Mazare, and Jason Weston. 2019. Learning from dialogue after deployment: Feed yourself, chatbot! In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3667– 3684.
    Google ScholarLocate open access versionFindings
  • Elad Hazan, Amit Agarwal, and Satyen Kale. 2007. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169–192.
    Google ScholarLocate open access versionFindings
  • Luheng He, Julian Michael, Mike Lewis, and Luke Zettlemoyer. 20Human-in-the-loop parsing. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2337–2342.
    Google ScholarLocate open access versionFindings
  • Wassily Hoeffding. 1994. Probability inequalities for sums of bounded random variables. In The Collected Works of Wassily Hoeffding, pages 409–426. Springer.
    Google ScholarLocate open access versionFindings
  • Wonseok Hwang, Jinyeung Yim, Seunghyun Park, and Minjoon Seo. 2019. A comprehensive exploration on WikiSQL with table-aware word contextualization. arXiv preprint arXiv:1902.01069.
    Findings
  • Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer. 2017. Learning a neural semantic parser from user feedback. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 963–973.
    Google ScholarLocate open access versionFindings
  • Kshitij Judah, Alan P Fern, Thomas G Dietterich, and Prasad Tadepalli. 2014. Active imitation learning: Formal and practical reductions to iid learning. Journal of Machine Learning Research, 15:4105–4143.
    Google ScholarLocate open access versionFindings
  • Sham M Kakade and Ambuj Tewari. 2009. On the generalization ability of online strongly convex programming algorithms. In Advances in Neural Information Processing Systems, pages 801–808.
    Google ScholarLocate open access versionFindings
  • Beomjoon Kim and Joelle Pineau. 2013. Maximum mean discrepancy imitation learning. Robotics: Science and systems.
    Google ScholarFindings
  • Julia Kreutzer and Stefan Riezler. 2019. Self-regulated interactive sequence-to-sequence learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 303–315.
    Google ScholarLocate open access versionFindings
  • Fei Li and HV Jagadish. 2014. Constructing an interactive natural language interface for relational databases. Proceedings of the VLDB Endowment, 8(1):73–84.
    Google ScholarLocate open access versionFindings
  • Natasha Lomas. 2019. Google ordered to halt human review of voice AI recordings over privacy risks. https://techcrunch.com/2019/08/02/google-ordered-to-halt-human-reviewof-voice-ai-recordings-over-privacyrisks/. Accessed:2020-04-28.
    Findings
  • Stephen Mayhew, Snigdha Chaturvedi, Chen-Tse Tsai, and Dan Roth. 2019. Named entity recognition with partially annotated training data. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 645–655.
    Google ScholarLocate open access versionFindings
  • Khanh Nguyen, Hal Daume III, and Jordan BoydGraber. 2017. Reinforcement learning for bandit neural machine translation with simulated human feedback. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1464–1474.
    Google ScholarLocate open access versionFindings
  • Ansong Ni, Pengcheng Yin, and Graham Neubig. 2020. Merging weak and active supervision for semantic parsing. In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), New York, USA.
    Google ScholarLocate open access versionFindings
  • Kamal Nigam and Rayid Ghani. 2000. Analyzing the effectiveness and applicability of co-training. In Proceedings of the ninth international conference on Information and knowledge management, pages 86– 93.
    Google ScholarLocate open access versionFindings
  • Pavel Petrushkov, Shahram Khadivi, and Evgeny Matusov. 2018. Learning from chunk-based feedback in neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 326–331.
    Google ScholarLocate open access versionFindings
  • Stephane Ross and Drew Bagnell. 2010. Efficient reductions for imitation learning. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 661–668.
    Google ScholarLocate open access versionFindings
  • Stephane Ross and J. Andrew Bagnell. 2014. Reinforcement and imitation learning via interactive noregret learning. ArXiv, abs/1406.5979.
    Findings
  • Stephane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627– 635.
    Google ScholarLocate open access versionFindings
  • Artem Sokolov, Julia Kreutzer, Christopher Lo, and Stefan Riezler. 2016. Learning structured predictors from bandit feedback for interactive nlp. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1610–1620.
    Google ScholarLocate open access versionFindings
  • Yu Su, Ahmed Hassan Awadallah, Madian Khabsa, Patrick Pantel, Michael Gamon, and Mark Encarnacion. 2017. Building natural language interfaces to web apis. In Proceedings of the International Conference on Information and Knowledge Management.
    Google ScholarLocate open access versionFindings
  • Yu Su, Ahmed Hassan Awadallah, Miaosen Wang, and Ryen W White. 2018. Natural language interfaces with fine-grained user interaction: A case study on web APIs. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval.
    Google ScholarLocate open access versionFindings
  • William A Woods. 1973. Progress in natural language understanding: an application to lunar geology. In Proceedings of the American Federation of Information Processing Societies Conference.
    Google ScholarLocate open access versionFindings
  • Ziyu Yao, Xiujun Li, Jianfeng Gao, Brian Sadler, and Huan Sun. 2019a. Interactive semantic parsing for if-then recipes via hierarchical reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 2547–2554.
    Google ScholarLocate open access versionFindings
  • Ziyu Yao, Yu Su, Huan Sun, and Wen-tau Yih. 2019b. Model-based interactive semantic parsing: A unified framework and a text-to-SQL case study. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5450–5461.
    Google ScholarLocate open access versionFindings
  • Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. 2018. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921.
    Google ScholarLocate open access versionFindings
  • Luke S Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence, pages 658–666.
    Google ScholarLocate open access versionFindings
  • Jiakai Zhang and Kyunghyun Cho. 2017. Queryefficient imitation learning for end-to-end simulated driving. In Thirty-First AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Rui Zhang, Tao Yu, Heyang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, and Dragomir Radev. 2019. Editing-based SQL query generation for cross-domain context-dependent questions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5341–5352.
    Google ScholarLocate open access versionFindings
  • Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103.
    Findings
  • Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
    Google ScholarFindings
  • Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, and Jason Weston. 2020. Neural text generation with unlikelihood training. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Following Ross et al. (2011), we first focus the proof on an infinite sample case, which assumes an infinite number of samples to train a policy in each iteration (i.e., m = ∞ in Algorithm 1). As an overview, we start the analysis by introducing the “cost function” we use to analyze each policy in Appendix A.1, which represents an inverse quality of a policy. In Appendix A.2, we derive the bound of the cost of the supervised approach. Appendix A.3 and Appendix A.4 then discuss the cost bound of our proposed algorithm. Finally, in Appendix A.5, we show the cost bound of our algorithm in finite sample case.
    Google ScholarFindings
  • The second equality holds by the definition st = (q, a1:t−1). In this analysis, we follow Ross and Bagnell (2010); Ross et al. (2011) to assume a unified decision length T. By summing up the above expected cost over the T steps, we define the total cost of executing policy πfor T steps as: T
    Google ScholarLocate open access versionFindings
  • Many no-regret algorithms (Hazan et al., 2007; Kakade and Tewari, 2009) that guarantee γN ∈
    Google ScholarFindings
  • Following Eq. (5), we need to switch the derivation from the expected loss of πi over dπi (i.e., Es∼dπi [ (s, πi)]) to that over Di (i.e., Es∼Di[ (s, πi)]), the actual state distribution that πi is trained on. To fill this gap, we introduce Yij to denote the difference between the expected loss of πi under dπi and the average loss of πi under the jth sample trajectory with π at iteration i. The random variables Yij over all i ∈ {1, 2,..., N } and j ∈ {1, 2,..., m} are all zero mean, bounded in [− max, max] and form a martingale in the order of Y11, Y12,..., Y1m, Y21,..., YNm. By AzumaHoeffding’s inequality (Azuma, 1967; Hoeffding, 1994), 1 mN
    Google ScholarFindings
  • Our system assumes an interactive semantic parsing framework to collect user feedback. In experiments, this is implemented by adapting MISP (Yao et al., 2019b), an open-sourced framework that has demonstrated a strong ability to improve test-time parsing accuracy.6 In this framework, an agent is comprised of three components: a world model that wraps the base semantic parser and a feedback incorporation module to interpret user feeds and update the semantic parse, an error detector that decides whether to request for user intervention, and an actuator that delivers the agent’s request by asking a natural language question, such that users without domain expertise can understand.
    Google ScholarFindings
  • User Simulator. Our experiments train each system with simulated user feedback. To this end, we build a user simulator similar to the one used by Yao et al. (2019b), which can access the groundtruth SQL queries. It gives yes/no answer or selects a choice by directly comparing the sampled policy action with the true one in the gold query.
    Google ScholarLocate open access versionFindings
  • In the data preprocessing step, EditSQL (Zhang et al., 2019) transforms each gold SQL query into a sequence of tokens, where the From clause is removed and each column Col is prepended by its paired table name, i.e., Tab.Col. However, we
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
小科