AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a parser-independent interactive approach, PIIA, to enhance the text-to-SQL process in Natural Language Interfaces to Databases systems

“What Do You Mean by That?” A Parser Independent Interactive Approach for Enhancing Text to SQL

empirical methods in natural language processing, pp.6913-6922, (2020)

Cited by: 0|Views204
Full Text
Bibtex
Weibo

Abstract

In Natural Language Interfaces to Databases systems, the text-to-SQL technique allows users to query databases by using natural language questions. Though significant progress in this area has been made recently, most parsers may fall short when they are deployed in real systems. One main reason stems from the difficulty of fully understa...More

Code:

Data:

0
Introduction
  • The past few years have witnessed a burgeoning interest in the study of text-to-SQL, the essential technique for Natural Language Interfaces to Databases (NLIDB) systems (Guo et al, 2019; Hwang et al, 2019; He et al, 2019a; Bogin et al, 2019a,b).
  • Works tried to get users involved in checking SQL queries (Li and Jagadish, 2014; Iyer et al, 2017; Yaghmazadeh et al, 2017), which are impracticable in real systems, as they can only succeed if users have a very good knowledge of SQL
  • In another attempt to involves users, Gur et al (2018) proposed to interact with non-expert users by multi-choice questions.
  • Assuming parsers are a black box, it is indispensable to conduct research on an interactive approach for enhancing the text-to-SQL technique in complex scenarios
Highlights
  • The past few years have witnessed a burgeoning interest in the study of text-to-SQL, the essential technique for Natural Language Interfaces to Databases (NLIDB) systems (Guo et al, 2019; Hwang et al, 2019; He et al, 2019a; Bogin et al, 2019a,b)
  • Though significant progress has been made in this field, most parsers are still less than desirable when deployed in real NLIDB systems
  • Works tried to get users involved in checking SQL queries (Li and Jagadish, 2014; Iyer et al, 2017; Yaghmazadeh et al, 2017), which are impracticable in real systems, as they can only succeed if users have a very good knowledge of SQL
  • We propose a Parser-Independent Interactive Approach (PIIA) to interact with human users and help parsers better understand natural language (NL) questions
  • We propose a parser-independent interactive approach, PIIA, to enhance the text-to-SQL process in NLIDB systems
  • PIIA interacts with users via multi-choice questions and can be built on arbitrary parsers
Methods
  • While querying databases in an NLIDB system, users pose a natural language question that is denoted as x.
  • The text-to-SQL parser takes x as input and predicts a SQL query, which is denoted as y.
  • Users are not experts in database querying, so they may pose natural language questions with inexplicit expressions.
  • The authors build PIIA upon parsers that can interactively revise inexplicit expressions in x with the help of users’ feedback, enhancing the performance of text-to-SQL
Results
  • By interacting with real users, PIIA boosts the overall SQLAcc of both IRNet and IRNet+BERT by an absolute improvement of 3.7% and 1.5%, respectively.
  • As shown in Table 1, PIIA with the NL Modifier improves the efficacy of IRNet SQLAcc from 53.2% to 59.3%
Conclusion
  • The authors propose a parser-independent interactive approach, PIIA, to enhance the text-to-SQL process in NLIDB systems.
  • PIIA interacts with users via multi-choice questions and can be built on arbitrary parsers.
  • Experimental results show this approach leads to significant performance boosts on two cross-domain datasets with five different base parsers.
  • The authors are interested in distilling and reusing the common knowledge from users’ selections
Summary
  • Introduction:

    The past few years have witnessed a burgeoning interest in the study of text-to-SQL, the essential technique for Natural Language Interfaces to Databases (NLIDB) systems (Guo et al, 2019; Hwang et al, 2019; He et al, 2019a; Bogin et al, 2019a,b).
  • Works tried to get users involved in checking SQL queries (Li and Jagadish, 2014; Iyer et al, 2017; Yaghmazadeh et al, 2017), which are impracticable in real systems, as they can only succeed if users have a very good knowledge of SQL
  • In another attempt to involves users, Gur et al (2018) proposed to interact with non-expert users by multi-choice questions.
  • Assuming parsers are a black box, it is indispensable to conduct research on an interactive approach for enhancing the text-to-SQL technique in complex scenarios
  • Objectives:

    The authors' goal is to increase s(x, xpos) and decrease s(x, xneg).
  • Methods:

    While querying databases in an NLIDB system, users pose a natural language question that is denoted as x.
  • The text-to-SQL parser takes x as input and predicts a SQL query, which is denoted as y.
  • Users are not experts in database querying, so they may pose natural language questions with inexplicit expressions.
  • The authors build PIIA upon parsers that can interactively revise inexplicit expressions in x with the help of users’ feedback, enhancing the performance of text-to-SQL
  • Results:

    By interacting with real users, PIIA boosts the overall SQLAcc of both IRNet and IRNet+BERT by an absolute improvement of 3.7% and 1.5%, respectively.
  • As shown in Table 1, PIIA with the NL Modifier improves the efficacy of IRNet SQLAcc from 53.2% to 59.3%
  • Conclusion:

    The authors propose a parser-independent interactive approach, PIIA, to enhance the text-to-SQL process in NLIDB systems.
  • PIIA interacts with users via multi-choice questions and can be built on arbitrary parsers.
  • Experimental results show this approach leads to significant performance boosts on two cross-domain datasets with five different base parsers.
  • The authors are interested in distilling and reusing the common knowledge from users’ selections
Tables
  • Table1: Simulation results of PIIA on the WikiSQL test set and the Spider development set
  • Table2: SQL Accuracy of human evaluation (H) and simulation (S) on 300 samples
  • Table3: Cases by IRNet+PIIA on Spider. Texts highlighted in gray indicate column names in the databases
Download tables as Excel
Related work
  • The works most related to ours are those investigating interactive semantic parsing. For instance, DailSQL, proposed by Gur et al (2018), aims to detect error spans and their categories based on an encoder-decoder architecture. But it is designed for relatively simple scenarios. In this research area, another impressive work involves a modelbased interaction system, which detects uncertain tokens and asks questions relying on inner parser states (Yao et al, 2019). Unlike these studies, however, we design a parser-independent interactive approach that can also perform cross-domain complex SQL queries. In the field of applied systems, Gao et al (2015) focused on user interface designing and proposed an interactive semantic parsing system called Datatone. In contrast to them, our main contribution lies in the realm of technology. Another topic our method related to is query reformulation. The idea of query reformulation is explored by Ray et al (2018) and Rastogi et al (2019), while they apply this idea in other domains with different scenarios. Our work is also related to semantic parsing, the process of converting natural language utterances into logical forms. Sequenceto-sequence methods are widely applied to solve this task (Berant et al, 2013; Dong and Lapata, 2016; Finegan-Dollak et al, 2018; Su et al, 2018). To reduce search space for decoding, several works employed intermediate representations to generate abstract representations (Cheng et al, 2017; Goldman et al, 2018; Dong and Lapata, 2018). Although these methods have achieved an impressive performance in experimental studies, there is still a long way to go before they can be successfully
Funding
  • This work was supported in part by NSFC under Grant No 61532001, National Key Research and Development Program of China under Grant No 2018AAA0101902, and MOE-ChinaMobile Program under Grant No MCM20170503
Study subjects and analysis
random samples: 50
The perturbed samples have the same uninformative tokens with xpos, and help model to focus on the alignment of informative tokens. We generate 50 random samples and 50 perturbed samples for each (x, xpos) pair. By generating negative samples, we obtain the training data composed of triples (x, xpos, xneg)

handannotated pairs: 80654
We conduct experiments on two cross-domain textto-SQL datasets with five base parsers. The WikiSQL dataset (Zhong et al, 2017) collects 24,241 cross-domain single-table databases from Wikipedia and contains 80,654 handannotated pairs of NL questions and SQL queries. The SQL queries are relatively simple with only SELECT and WHERE clauses

samples: 15878
(2) SQLNet (Xu et al, 2017) applies sequence-to-set prediction and employs a sketch-based approach to predict SQL queries. We report our PIIA results on the test set, which contains 15,878 samples. The Spider dataset (Yu et al, 2018b) is a humanlabeled text-to-SQL dataset that consists of 10,181 NL questions and 5,693 unique complex SQL queries on 200 databases with multiple tables

samples: 1034
(2) IRNet+BERT takes BERT as NL encoder to enhance the performance of basic IRNet. (3) SyntaxSQLNet (Yu et al, 2018a) employs a SQL specific syntax tree based decoder and table-aware column attention encoders. The test set is not publicly available, so we evaluate PIIA on the development set, which contains 1,034 samples. In Error Locator, the similarity threshold is set to be the average score of all the (x, x ) pairs in the training triples X = {(x, xpos, xneg)} as follows: WikiSQL

volunteers: 30
The human evaluation is performed on the more complex Spider dataset and with two state-of-theart parsers, i.e., IRNet and IRNet+BERT. We randomly sample 300 NL questions from the Spider development set and invite 30 volunteers majoring in liberal arts to interact with the PIIA agent. Each NL question is evaluated by three volunteers, all of whom are non-expert without any background knowledge of SQL queries

volunteers: 3
We randomly sample 300 NL questions from the Spider development set and invite 30 volunteers majoring in liberal arts to interact with the PIIA agent. Each NL question is evaluated by three volunteers, all of whom are non-expert without any background knowledge of SQL queries. We provide them with the NL questions and the corresponding databases, and they interact with PIIA by answering the multichoice questions

cases: 1034
As observed, the interaction process finishes in four turns in nearly 90% of the cases. Only 10 out of 1,034 cases require an interaction process with more than five turns, which indicates PIIA is able to process such a complex dataset with high efficiency. We also analyze the cases correctly modified by the simulation and find that about 40% of the multi-choice questions get the None answer, which is an acceptable percentage

cases: 6
The words in bold are uncertain tokens and their corrections. Though IRNet wrongly parses these six cases, PIIA manages to solve them correctly. The first five cases are modified by rules for nouns, verbs, and adjectives that are related to the column names in the databases

cases: 5
Though IRNet wrongly parses these six cases, PIIA manages to solve them correctly. The first five cases are modified by rules for nouns, verbs, and adjectives that are related to the column names in the databases. Different rules are applied to add the column names into NL questions, making it more explicit for the parser to understand them

samples: 300
Simulation results of PIIA on the WikiSQL test set and the Spider development set. SQL Accuracy of human evaluation (H) and simulation (S) on 300 samples. Cases by IRNet+PIIA on Spider. Texts highlighted in gray indicate column names in the databases

Reference
  • Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on Freebase from question-answer pairs. In EMNLP.
    Google ScholarFindings
  • Ben Bogin, Matt Gardner, and Jonathan Berant. 2019a. Global reasoning over database structures for text-toSQL parsing. In EMNLP.
    Google ScholarFindings
  • Ben Bogin, Matt Gardner, and Jonathan Berant. 2019b. Representing schema structure with graph neural networks for text-to-SQL parsing. In ACL.
    Google ScholarFindings
  • Jianpeng Cheng, Siva Reddy, Vijay Saraswat, and Mirella Lapata. 2017. Learning structured natural language representations for semantic parsing. In ACL.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
    Google ScholarFindings
  • Kedar Dhamdhere, Kevin S McCurley, Ralfi Nahmias, Mukund Sundararajan, and Qiqi Yan. 2017. Analyza: Exploring data with conversation. In IUI.
    Google ScholarFindings
  • Li Dong and Mirella Lapata. 2016. Language to logical form with neural attention. In ACL.
    Google ScholarFindings
  • Li Dong and Mirella Lapata. 201Coarse-to-fine decoding for neural semantic parsing. In ACL.
    Google ScholarFindings
  • Catherine Finegan-Dollak, Jonathan K Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, and Dragomir Radev. 2018. Improving text-to-SQL evaluation methodology. In ACL.
    Google ScholarFindings
  • Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G Karahalios. 2015. Datatone: Managing ambiguity in natural language interfaces for data visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, pages 489–500.
    Google ScholarLocate open access versionFindings
  • Alfio Gliozzo, Or Biran, Siddharth Patwardhan, and Kathleen McKeown. 2013. Semantic technologies in IBM Watson. In Proceedings of the Fourth Workshop on Teaching NLP and CL, pages 85–92.
    Google ScholarLocate open access versionFindings
  • Omer Goldman, Veronica Latcinnik, Udi Naveh, Amir Globerson, and Jonathan Berant. 2018. Weaklysupervised semantic parsing with abstract examples. In ACL.
    Google ScholarFindings
  • Daya Guo, Yibo Sun, Duyu Tang, Nan Duan, Jian Yin, Hong Chi, James Cao, Peng Chen, and Ming Zhou. 2018. Question generation from SQL queries improves neural semantic parsing. In EMNLP.
    Google ScholarFindings
  • Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, and Dongmei Zhang. 2019. Towards complex text-to-SQL in crossdomain database with intermediate representation. In ACL.
    Google ScholarFindings
  • Izzeddin Gur, Semih Yavuz, Yu Su, and Xifeng Yan. 2018. DialSQL: Dialogue based structured query generation. In ACL.
    Google ScholarFindings
  • Pengcheng He, Yi Mao, Kaushik Chakrabarti, and Weizhu Chen. 2019a. X-SQL: reinforce schema representation with context. arXiv preprint arXiv:1908.08113.
    Findings
  • Shilin He, Zhaopeng Tu, Xing Wang, Longyue Wang, Michael R Lyu, and Shuming Shi. 2019b. Towards understanding neural machine translation with word importance. In EMNLP.
    Google ScholarFindings
  • Wonseok Hwang, Jinyeung Yim, Seunghyun Park, and Minjoon Seo. 2019. A comprehensive exploration on WikiSQL with table-aware word contextualization. In KR2ML Workshop at NeurIPS.
    Google ScholarLocate open access versionFindings
  • Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer. 2017. Learning a neural semantic parser from user feedback. In ACL.
    Google ScholarFindings
  • Sanjaya Lai, Kedar Doshi, Yamuna Esaiarasan, and Chaitanya Bhatt. 2014. Systems and methods for performing record actions in a multi-tenant database and application system. US Patent 8,818,940.
    Google ScholarFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In EMNLP.
    Google ScholarFindings
  • Pushpendre Rastogi, Arpit Gupta, Tongfei Chen, and Lambert Mathias. 2019. Scaling multi-domain dialogue state tracking via query reformulation. arXiv preprint arXiv:1903.05164.
    Findings
  • Avik Ray, Yilin Shen, and Hongxia Jin. 2018. Learning out-of-vocabulary words in intelligent personal agents. In IJCAI, pages 4309–4315.
    Google ScholarLocate open access versionFindings
  • Yu Su, Ahmed Hassan Awadallah, Miaosen Wang, and Ryen W White. 2018. Natural language interfaces with fine-grained user interaction: A case study on web APIs. In SIGIR, pages 855–864. ACM.
    Google ScholarLocate open access versionFindings
  • Xiaojun Xu, Chang Liu, and Dawn Song. 2017. SQLNet: Generating structured queries from natural language without reinforcement learning. arXiv preprint arXiv:1711.04436.
    Findings
  • Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: query synthesis from natural language. ACM on Programming Languages, 1(OOPSLA):63.
    Google ScholarLocate open access versionFindings
  • Ziyu Yao, Yu Su, Huan Sun, and Wen-tau Yih. 2019. Model-based interactive semantic parsing: A unified framework and a text-to-SQL case study. In EMNLP.
    Google ScholarFindings
  • Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, and Dragomir Radev. 2018a. SyntaxSQLnet: Syntax tree networks for complex and cross-domain text-to-SQL task. In EMNLP.
    Google ScholarFindings
  • Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. 2018b. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In EMNLP.
    Google ScholarFindings
  • Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating structured queries from natural language using reinforcement learning. In CoRR.
    Google ScholarFindings
  • Joel Legrand, Michael Auli, and Ronan Collobert. 2016. Neural network-based word alignment through score aggregation. In SIGMT.
    Google ScholarLocate open access versionFindings
  • Fei Li and HV Jagadish. 2014. Constructing an interactive natural language interface for relational databases. VLDB Endowment, 8(1):73–84.
    Google ScholarLocate open access versionFindings
  • Yang Liu and Maosong Sun. 2015. Contrastive unsupervised word alignment with non-local features. In AAAI.
    Google ScholarFindings
  • Edward Loper and Steven Bird. 2002. NLTK: the natural language toolkit. arXiv preprint cs/0205028.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科