Benchmarking Multimodal Regex Synthesis with Complex Structures

ACL, pp. 6081-6094, 2020.

Cited by: 0|Bibtex|Views38
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
DEEPREGEX trained on natural language-TURK completely fails on STACKOVERFLOW and even fails to predict reasonable regexes that are consistent with the examples

Abstract:

Existing datasets for regular expression (regex) generation from natural language are limited in complexity; compared to regex tasks that users post on StackOverflow, the regexes in these datasets are simple, and the language used to describe them is not diverse. We introduce StructuredRegex, a new regex synthesis dataset differing from...More
0
Introduction
  • Regular expressions are known for their usefulness and wide applicability, and yet they are hard to understand and write, even for many programmers (Friedl, 2006).
  • Recent research has studied how to construct regexes from natural language (NL) descriptions, leading to the emergence of NL-to-regex datasets including.
  • Locascio et al (2016) subsequently employed a generate-and-paraphrase procedure (Wang et al, 2015) to create the larger NL-TURK dataset.
  • The regexes in this dataset are very simple, and the descriptions are short, formulaic, and not linguistically diverse because of the paraphrasing annotation procedure (Herzig and Berant, 2019).
  • The limited size of this dataset makes it (a) the author needs to validate the pattern: starts with “C0” and finish with 4 digits exactly. and(startwith()),endwith(rep(,4)))
Highlights
  • Regular expressions are known for their usefulness and wide applicability, and yet they are hard to understand and write, even for many programmers (Friedl, 2006)
  • Our DEEPREGEX + APPROX achieves best accuracy with 5.6% and 7.9% improvement over DEEPREGEX + FILTER on Test and Test-E, respectively, since it can leverage examples more effectively using over- and under- approximations during search
  • DEEPREGEX trained on natural language-TURK completely fails on STACKOVERFLOW and even fails to predict reasonable regexes that are consistent with the examples
  • DEEPREGEX trained on our dataset can at least achieve 9.8% accuracy on STACKOVERFLOW dataset because the English descriptions in this dataset better match the desired task
  • Our DEEPREGEX + APPROX model successfully solves 13.7% and finds consistent regexes for 38% of the tasks, which is credible given that the performance of the same model on Test-E set is only 30%
  • Our dataset contains compositionally structured regexes paired with linguistically diverse language, and organically includes distinguishing examples
Methods
  • The authors evaluate the accuracy of both existing grammar-based approaches and neural models, as well as a novel method that targets the multimodal nature of the dataset.
  • The core idea of the approach is that, for each partially completed regex during decoding, the authors use the approximation technique to infer whether the regex can possibly match all positive or reject all negative examples.
  • If this is impossible, the authors can prune this partial regex from the search.
  • This approach allows them to more effectively explore the set of plausible regexes without increasing the computational budget or beam size
Results
  • Detailed Results on STRUCTUREDREGEX

    Table 5 shows the detailed accuracy regarding different regex templates on both Test and Test-E sets.
  • As shown in Table 6, DEEPREGEX trained on NL-TURK completely fails on STACKOVERFLOW and even fails to predict reasonable regexes that are consistent with the examples.
  • This is caused by the fact that the NLTURK dataset contains formulaic descriptions and shallow regexes that are not representative of realworld tasks.
  • The authors believe the transfer results here show that improved performance on the dataset may transfer to STACKOVERFLOW as well, since some of the challenges present in the Test-E set
Conclusion
  • The authors introduce STRUCTUREDREGEX, a new dataset for regex synthesis from natural language and examples.
  • The authors' dataset contains compositionally structured regexes paired with linguistically diverse language, and organically includes distinguishing examples.
  • Better methods are needed to solve this dataset; the authors show that such methods might generalize well to real-world settings
Summary
  • Introduction:

    Regular expressions are known for their usefulness and wide applicability, and yet they are hard to understand and write, even for many programmers (Friedl, 2006).
  • Recent research has studied how to construct regexes from natural language (NL) descriptions, leading to the emergence of NL-to-regex datasets including.
  • Locascio et al (2016) subsequently employed a generate-and-paraphrase procedure (Wang et al, 2015) to create the larger NL-TURK dataset.
  • The regexes in this dataset are very simple, and the descriptions are short, formulaic, and not linguistically diverse because of the paraphrasing annotation procedure (Herzig and Berant, 2019).
  • The limited size of this dataset makes it (a) the author needs to validate the pattern: starts with “C0” and finish with 4 digits exactly. and(startwith()),endwith(rep(,4)))
  • Methods:

    The authors evaluate the accuracy of both existing grammar-based approaches and neural models, as well as a novel method that targets the multimodal nature of the dataset.
  • The core idea of the approach is that, for each partially completed regex during decoding, the authors use the approximation technique to infer whether the regex can possibly match all positive or reject all negative examples.
  • If this is impossible, the authors can prune this partial regex from the search.
  • This approach allows them to more effectively explore the set of plausible regexes without increasing the computational budget or beam size
  • Results:

    Detailed Results on STRUCTUREDREGEX

    Table 5 shows the detailed accuracy regarding different regex templates on both Test and Test-E sets.
  • As shown in Table 6, DEEPREGEX trained on NL-TURK completely fails on STACKOVERFLOW and even fails to predict reasonable regexes that are consistent with the examples.
  • This is caused by the fact that the NLTURK dataset contains formulaic descriptions and shallow regexes that are not representative of realworld tasks.
  • The authors believe the transfer results here show that improved performance on the dataset may transfer to STACKOVERFLOW as well, since some of the challenges present in the Test-E set
  • Conclusion:

    The authors introduce STRUCTUREDREGEX, a new dataset for regex synthesis from natural language and examples.
  • The authors' dataset contains compositionally structured regexes paired with linguistically diverse language, and organically includes distinguishing examples.
  • Better methods are needed to solve this dataset; the authors show that such methods might generalize well to real-world settings
Tables
  • Table1: Statistics of our dataset and prior datasets. Compared to KB13 and NL-TURK, our dataset contain more diverse language and more complex regexes, comparable to the real STACKOVERFLOW dataset
  • Table2: Qualitative analysis on 150 descriptions from NL-TURK and our dataset (50 from each template). We show the percentage of examples containing each phenomenon. Our dataset features more of these challenging linguistic phenomena compared to prior synthetic datasets
  • Table3: Distribution mismatch analysis with respect to STACKOVERFLOW on past datasets and our dataset. Our dataset covers significantly more words and regexes, and is closer to the real-world dataset
  • Table4: DFA-equivalent accuracy on prior datasets and our dataset. The performance on our dataset using any model is much lower than the performance on existing datasets
  • Table5: Results for models trained and tested on STRUCTUREDREGEX. Using the examples (the latter two methods) gives a substantial accuracy boost, and DEEPREGEX + APPROX is better than the post-hoc FILTER method, but still only achieves 48.2% accuracy on Test and 36.0% on Test-E. Separation regexes are more difficult than the other two classes, and performance for all models drops on Test-E
  • Table6: The performance on STACKOVERFLOW-51 with models trained on NL-TURK and our dataset. We report the fraction of examples where a DFA-equivalent regex is found (Acc), where a DFA-equivalent regex is found in the k-best list, and where a regex consistent with the examples appears in the k-best list. Models trained on NL-TURK do not perform well in this setting, while our models can solve some examples
  • Table7: Our regex DSL and the corresponding constructions in standard regular language. Our regex DSL is as expressive as and can be easily translated to standard regex syntax
Download tables as Excel
Related work
  • Data collection in semantic parsing Collecting large-scale data for semantic parsing and related tasks is a long-standing challenge (Berant et al, 2013; Wang et al, 2015). Wang et al (2015) proposed the generate-and-paraphrase framework, which has been adopted to collect datasets in various domains (Locascio et al, 2016; Ravichander et al, 2017; Johnson et al, 2017). However, this process often biases annotators towards using formulaic language (Ravichander et al, 2017; Herzig and Berant, 2019).

    Similar to our work, past work has sought to elicit linguistically diverse data using visual elements for semantic parsing (Long et al, 2016), natural language generation (Novikova et al, 2016), and visual reasoning (Suhr et al, 2017, 2019). However, for these other tasks, the images used are depictions of an inherently graphical underlying world state; e.g., the NLVR dataset (Suhr et al, 2017) and NLVR2 (Suhr et al, 2019) are based on reasoning over the presented images, and the Tangrams dataset (Long et al, 2016) involves describing shape transformations. By contrast, regexes are typically represented as source code; there is no standard graphical schema for depicting the patterns they recognize. This changes the properties of the generated descriptions, leading to higher levels of compositionality and ambiguity because what’s being described is not naturally an image.
Funding
  • This work was partially supported by NSF Grant IIS-1814522, NSF Grant SHF-1762299, a gift from Arm, and an equipment grant from NVIDIA
Reference
  • Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntaxguided Synthesis. In 2013 Formal Methods in Computer-Aided Design (FMCAD).
    Google ScholarFindings
  • Jacob Andreas, Dan Klein, and Sergey Levine. 2018. Learning with Latent Language. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL).
    Google ScholarLocate open access versionFindings
  • Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 201Semantic Parsing on Freebase from Question-Answer Pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. 2017. Robustfill: Neural Program Learning under Noisy I/O. In Proceedings of the International Conference on Machine Learning (ICML).
    Google ScholarLocate open access versionFindings
  • Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program Synthesis Using Conflict-driven Learning. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
    Google ScholarLocate open access versionFindings
  • Jeffrey EF Friedl. 200Mastering Regular Expressions. ” O’Reilly Media, Inc.”.
    Google ScholarFindings
  • Mor Geva, Yoav Goldberg, and Jonathan Berant. 2019. Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (EMNLPIJCNLP).
    Google ScholarLocate open access versionFindings
  • Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL).
    Google ScholarLocate open access versionFindings
  • Sumit Gulwani and Prateek Jain. 2017. Programming by Examples: PL Meets ML. In Proceedings of the Asian Symposium on Programming Languages and Systems (APLAS).
    Google ScholarLocate open access versionFindings
  • Jonathan Herzig and Jonathan Berant. 2019. Don’t paraphrase, detect! Rapid and Effective Data Collection for Semantic Parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (EMNLPIJCNLP).
    Google ScholarLocate open access versionFindings
  • Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. 2017. Clevr: A Diagnostic Dataset for Compositional Language and Elementary Visual
    Google ScholarFindings
  • Reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    Google ScholarLocate open access versionFindings
  • Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. 2007. Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS).
    Google ScholarLocate open access versionFindings
  • Nate Kushman and Regina Barzilay. 2013. Using Semantic Unification to Generate Regular Expressions from Natural Language. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NACCL).
    Google ScholarLocate open access versionFindings
  • Mina Lee, Sunbeom So, and Hakjoo Oh. 2016. Synthesizing Regular Expressions from Examples for Introductory Automata Assignments. In Proceedings of the ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE).
    Google ScholarLocate open access versionFindings
  • Nicholas Locascio, Karthik Narasimhan, Eduardo DeLeon, Nate Kushman, and Regina Barzilay. 20Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
    Google ScholarLocate open access versionFindings
  • Reginald Long, Panupong Pasupat, and Percy Liang. 2016. Simpler Context-Dependent Logical Forms via Model Projections. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
    Google ScholarLocate open access versionFindings
  • Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attentionbased Neural Machine Translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
    Google ScholarLocate open access versionFindings
  • Anders Møller. 2017. dk.brics.automaton – finitestate automata and regular expressions for Java. http://www.brics.dk/automaton/.
    Findings
  • Jekaterina Novikova, Oliver Lemon, and Verena Rieser. 2016. Crowd-sourcing NLG Data: Pictures Elicit Better Data. In Proceedings of the International Natural Language Generation conference (INLG).
    Google ScholarLocate open access versionFindings
  • Maxwell Nye, Luke Hewitt, Joshua Tenenbaum, and Armando Solar-Lezama. 2019. Learning to Infer Program Sketches. In Proceedings of the International Conference on Machine Learning (ICML).
    Google ScholarLocate open access versionFindings
  • Aarne Ranta. 1998. A Multilingual Natural-Language Interface to Regular Expressions. In Finite State Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Abhilasha Ravichander, Thomas Manzini, Matthias Grabmair, Graham Neubig, Jonathan Francis, and Eric Nyberg. 2017. How Would You Say It? Eliciting Lexically Diverse Dialogue for Supervised Semantic Parsing. In Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL).
    Google ScholarLocate open access versionFindings
  • Alane Suhr, Mike Lewis, James Yeh, and Yoav Artzi. 2017. A Corpus of Natural Language for Visual Reasoning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
    Google ScholarLocate open access versionFindings
  • Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, and Yoav Artzi. 2019. A Corpus for Reasoning about Natural Language Grounded in Photographs. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
    Google ScholarLocate open access versionFindings
  • Xinyu Wang, Sumit Gulwani, and Rishabh Singh. 2016. FIDEX: Filtering Spreadsheet Data Using Examples. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA).
    Google ScholarLocate open access versionFindings
  • Yushi Wang, Jonathan Berant, and Percy Liang. 2015. Building a Semantic Parser Overnight. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
    Google ScholarLocate open access versionFindings
  • Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA).
    Google ScholarLocate open access versionFindings
  • Xi Ye, Qiaochu Chen, Xinyu Wang, Isil Dillig, and Greg Durrett. 2019. Sketch-Driven Regular Expression Generation from Natural Language and Examples. In arXiv preprint arXiv:1908.05848.
    Findings
  • Zexuan Zhong, Jiaqi Guo, Wei Yang, Tao Xie, JianGuang Lou, Ting Liu, and Dongmei Zhang. 2018. Generating regular expressions from natural language specifications: Are we there yet? In the Statistical Modeling of Natural Software Corpora Workshop at the AAAI Conference on Artificial Intelligence (AAAI Workshop).
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments