Automatic Extraction of Rules Governing Morphological Agreement

Aditi Chaudhary
Aditi Chaudhary
Adithya Pratapa
Adithya Pratapa
David R. Mortensen
David R. Mortensen
Zaid Sheikh
Zaid Sheikh

EMNLP 2020, pp. 5212-5236, 2020.

Other Links: arxiv.org|academic.microsoft.com
Weibo:
Automated Evaluation As an alternative to the infeasible manual evaluation of all rules in every language, we propose an automated rule metric that evaluates how well the rules extracted from decision tree T fit to unseen gold-annotated test data

Abstract:

Creating a descriptive grammar of a language is an indispensable step for language documentation and preservation. However, at the same time it is a tedious, time-consuming task. In this paper, we take steps towards automating this process by devising an automated framework for extracting a first-pass grammatical specification from raw te...More

Code:

Data:

0
Introduction
  • While the languages of the world are amazingly diverse, one thing they share in common is their adherence to grammars — sets of morpho-syntactic rules specifying how to create sentences in the language.
  • To create the training data for rule extraction, the authors first annotate raw text with part-of-speech (POS) tags, morphological analyses, and dependency trees.
Highlights
  • While the languages of the world are amazingly diverse, one thing they share in common is their adherence to grammars — sets of morpho-syntactic rules specifying how to create sentences in the language
  • An important step in the understanding and documentation of languages is the creation of a grammar sketch, a concise and human-readable description of the unique characteristics of that particular language (e.g. Huddleston (2002) for En
  • Automated Evaluation As an alternative to the infeasible manual evaluation of all rules in every language, we propose an automated rule metric (ARM) that evaluates how well the rules extracted from decision tree T fit to unseen gold-annotated test data
  • We evaluate the quality of the rules induced by our framework, using gold-standard syntactic analyses and learning the decision trees over triples obtained from the training portion of all Syntactic Universal Dependencies (SUD) treebanks
  • In a reverse example from Catalan, the overwhelming majority (92%) of 8650 tokens are in the third-person, causing our model to label all leaves as chance agreement despite the fact that person/number agreement is required in such cases
  • Data statistics are listed in Appendix A.2. We parse these sentences using the “universal" Udify model that has been pre-trained on all of the Universal Dependencies (UD) treebanks, as released by (Kondratyuk and Straka, 2019). We use these automatically parsed syntactic analyses to extract the rules which we evaluate with ARM over the gold standard test data of the corresponding SUD treebanks
Results
  • The authors set this threshold to 90% based on manually inspecting some resulting trees to find a threshold that limited the number of non-agreeing syntactic structures being labeled as required-agreement.
  • Leaf 1: chance-agreement relation = conj, det, comp:obj head-pos = any child-pos = noun (a) Rule Extraction (b) Rule Labeling (c) Rule Merging specified by a null hypothesis.
  • Rule Learning The authors use sklearn’s (Buitinck et al, 2013) implementation of decision trees and train a separate model for each morphological feature f for a given language.
  • For each language/treebank the authors extract and evaluate the top 20 most frequent “head POS, dependency relation, dependent POS” triples for the six morphological features amounting to 120 sets of triples to be annotated.5 The authors present these triples with 10 randomly selected illustrative examples and ask a linguist to annotate whether there is a rule in this language governing agreement between the head-dependent pair for this relation.
  • The authors use a threshold of 0.95, and if qf,t > 0.95 the authors assign the test label ltest,f,t for that triple as required-agreement, and otherwise choose chance-agreement.7 Similar to the human evaluation, the authors compute a score for each triple t marking feature f
  • The authors evaluate the quality of the rules induced by the framework, using gold-standard syntactic analyses and learning the decision trees over triples obtained from the training portion of all SUD treebanks.
  • To compute the morphological complexity of a language, the authors use the word entropy measure proposed by Bentz et al (2016) which measures the average information content of words and is computed as follows: H(D) = − p(wi) log p(wi) where V is the vocabulary, D is the monolingual text extracted from the training portion of the respective treebank, p(wi) is the word type frequency normalized by the total tokens.
  • Like person in Russian, the model produces required-agreement labels, which the authors can attribute to skewed data statistics in the treebanks.
Conclusion
  • The authors observe that using cross-lingual transfer learning (CLTL) already leads to high scores across all languages even in zero-shot settings where the authors do not use any data from the gold-standard treebank.
  • The authors use these automatically parsed syntactic analyses to extract the rules which the authors evaluate with ARM over the gold standard test data of the corresponding SUD treebanks.
Summary
  • While the languages of the world are amazingly diverse, one thing they share in common is their adherence to grammars — sets of morpho-syntactic rules specifying how to create sentences in the language.
  • To create the training data for rule extraction, the authors first annotate raw text with part-of-speech (POS) tags, morphological analyses, and dependency trees.
  • The authors set this threshold to 90% based on manually inspecting some resulting trees to find a threshold that limited the number of non-agreeing syntactic structures being labeled as required-agreement.
  • Leaf 1: chance-agreement relation = conj, det, comp:obj head-pos = any child-pos = noun (a) Rule Extraction (b) Rule Labeling (c) Rule Merging specified by a null hypothesis.
  • Rule Learning The authors use sklearn’s (Buitinck et al, 2013) implementation of decision trees and train a separate model for each morphological feature f for a given language.
  • For each language/treebank the authors extract and evaluate the top 20 most frequent “head POS, dependency relation, dependent POS” triples for the six morphological features amounting to 120 sets of triples to be annotated.5 The authors present these triples with 10 randomly selected illustrative examples and ask a linguist to annotate whether there is a rule in this language governing agreement between the head-dependent pair for this relation.
  • The authors use a threshold of 0.95, and if qf,t > 0.95 the authors assign the test label ltest,f,t for that triple as required-agreement, and otherwise choose chance-agreement.7 Similar to the human evaluation, the authors compute a score for each triple t marking feature f
  • The authors evaluate the quality of the rules induced by the framework, using gold-standard syntactic analyses and learning the decision trees over triples obtained from the training portion of all SUD treebanks.
  • To compute the morphological complexity of a language, the authors use the word entropy measure proposed by Bentz et al (2016) which measures the average information content of words and is computed as follows: H(D) = − p(wi) log p(wi) where V is the vocabulary, D is the monolingual text extracted from the training portion of the respective treebank, p(wi) is the word type frequency normalized by the total tokens.
  • Like person in Russian, the model produces required-agreement labels, which the authors can attribute to skewed data statistics in the treebanks.
  • The authors observe that using cross-lingual transfer learning (CLTL) already leads to high scores across all languages even in zero-shot settings where the authors do not use any data from the gold-standard treebank.
  • The authors use these automatically parsed syntactic analyses to extract the rules which the authors evaluate with ARM over the gold standard test data of the corresponding SUD treebanks.
Tables
  • Table1: The Spanish gender rules extracted in a zeroshot setting are generally similar to the ones extracted from the gold data (93%). We highlight the few mistakes that the zero-shot tree makes
  • Table2: Dataset statistics. Training data is obtained by parsing the Liepzig corpora (<a class="ref-link" id="cGoldhahn_et+al_2012_a" href="#rGoldhahn_et+al_2012_a">Goldhahn et al, 2012</a>) and test data is obtained from the respective treebank. Each cell denotes the number of sentences in train/test
  • Table3: Dataset statistics. Train/Dev/Test denote the number of sentences in the respective treebank used for the target language
  • Table4: We used the same hyperparameters for training with a related languages as specified by the authors.12. In the configuration file, we only change the parameters warmup steps= 100 and start-step= 100, as recommended by the authors for low-resource languages
  • Table5: Comparing the ARM scores for SUD treebanks across both Statistical and Hard thresholding
Download tables as Excel
Related work
  • Bender et al (2014) use interlinear glossed text (IGT) to extract lexical entities and morphological rules for an endangered language. They experiment with different systems which individually extract lemmas, lexical rules, word order and the case system, some of which use hand-specified rules. Howell et al (2017) extend this to work to predict case system on additional languages. Zamaraeva (2016) also infer morphotactics from IGT using k-means clustering. To the best of our knowledge, our work is the first to propose a framework to extract firstpass grammatical agreement rules directly from raw text in a statistically-informed objective way. A parallel line of work (Hellan, 2010) extracts a construction profile of a language by having templates that define how sentences are constructed.
Funding
  • This work is sponsored by the DARPA grant FA8750-18-2-0018 and by the National Science Foundation under grant 1761548
Reference
  • Joseph Aoun, Elabbas Benmamoun, and Dominique Sportiche. 1994.
    Google ScholarFindings
  • Emily M. Bender, Joshua Crowgey, Michael Wayne Goodman, and Fei Xia. 2014. Learning grammar specifications from IGT: A case study of chintang. In Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 43–53, Baltimore, Maryland, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Christian Bentz, Tatyana Ruzsics, Alexander Koplenig, and Tanja Samardžic. 2016. A comparison between morphological complexity measures: Typological data vs. language corpora. In Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), pages 142–153, Osaka, Japan. The COLING 2016 Organizing Committee.
    Google ScholarLocate open access versionFindings
  • Robert D Borsley and Ian Roberts. 2005. The syntax of the Celtic languages: a comparative perspective. Cambridge University Press.
    Google ScholarFindings
  • Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.
    Google ScholarFindings
  • Keith Brown and Sarah Ogilvie. 2010. Concise encyclopedia of languages of the world. Elsevier.
    Google ScholarFindings
  • Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. 2013. API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pages 108–122.
    Google ScholarLocate open access versionFindings
  • Robin Cohen. 198Book reviews: Reasoning and discourse processes. Computational Linguistics, 14(4).
    Google ScholarLocate open access versionFindings
  • Bernard Comrie. 1984. Reflections on verb agreement in hindi and related languages.
    Google ScholarFindings
  • Greville G Corbett. 2009. Agreement. In Die slavischen Sprachen/The Slavic Languages.
    Google ScholarFindings
  • Harald Cramér. 1946. Mathematical methods of statistics. Princeton U. Press, Princeton, page 500.
    Google ScholarFindings
  • Dina B Crockett. 1976. Agreement in contemporary standard Russian. Slavica Publishers Inc.
    Google ScholarFindings
  • Kim Gerdes, Bruno Guillaume, Sylvain Kahane, and Guy Perrier. 2018. SUD or surface-syntactic universal dependencies: An annotation scheme nearisomorphic to UD. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), pages 66–74, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kim Gerdes, Bruno Guillaume, Sylvain Kahane, and Guy Perrier. 2019. Improving surface-syntactic universal dependencies (SUD): MWEs and deep syntactic features. In Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019), pages 126–132, Paris, France. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Dirk Goldhahn, Thomas Eckart, and Uwe Quasthoff. 2012. Building large monolingual dictionaries at the leipzig corpora collection: From 100 to 200 languages. In LREC, volume 29, pages 31–43.
    Google ScholarLocate open access versionFindings
  • Zellig S Harris. 1951. Methods in structural linguistics.
    Google ScholarFindings
  • Jean Hausser and Korbinian Strimmer. 2009. Entropy inference and the james-stein estimator, with application to nonlinear gene association networks. Journal of Machine Learning Research, 10(7).
    Google ScholarLocate open access versionFindings
  • Lars Hellan. 2010. From descriptive annotation to grammar specification. In Proceedings of the Fourth Linguistic Annotation Workshop, pages 172–176, Uppsala, Sweden. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kristen Howell, Emily M Bender, Michel Lockwood, Fei Xia, and Olga Zamaraeva. 2017. Inferring case systems from igt: Enriching the enrichment. In Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 67–75.
    Google ScholarLocate open access versionFindings
  • Rodney D Huddleston. 2002. The Cambridge grammar of the English language. Cambridge, UK; New York: Cambridge University Press.
    Google ScholarFindings
  • Dan Kondratyuk and Milan Straka. 2019. 75 languages, 1 model: Parsing universal dependencies universally. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), pages 2779–2795, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Joakim Nivre, Rogier Blokland, Niko Partanen, Michael Rießler, and Jack Rueter. 2018. Universal Dependencies 2.3.
    Google ScholarFindings
  • Joakim Nivre, Marie-Catherine De Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher D Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, et al. 2016. Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 1659–1666.
    Google ScholarLocate open access versionFindings
  • Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajic, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, and Daniel Zeman. 2020. Universal dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of The 12th Language Resources and Evaluation Conference, pages 4034–4043, Marseille, France. European Language Resources Association.
    Google ScholarLocate open access versionFindings
  • J. Ross Quinlan. 1986. Induction of decision trees. Machine learning, 1(1):81–106.
    Google ScholarLocate open access versionFindings
  • Gail M Sullivan and Richard Feinn. 2012. Using effect size—or why the p value is not enough. Journal of graduate medical education, 4(3):279–282.
    Google ScholarLocate open access versionFindings
  • Olga Zamaraeva. 2016. Inferring morphotactics from interlinear glossed text: Combining clustering and precision grammars. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 141–150, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ran Zmigrod, Sabrina J. Mielke, Hanna Wallach, and Ryan Cotterell. 2019. Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1651–1661, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • The best parameters are selected based on the validation set performance. For some treebanks which have no validation set we use the default crossvalidation provided by sklearn (Buitinck et al., 2013). Average model runtime for a treebanks is 5-10mins depending on the size of the treebank.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments