Sparse Logistic Regression with High-order Features for Automatic Grammar Rule Extraction from Treebanks
CoRR(2024)
摘要
Descriptive grammars are highly valuable, but writing them is time-consuming
and difficult. Furthermore, while linguists typically use corpora to create
them, grammar descriptions often lack quantitative data. As for formal
grammars, they can be challenging to interpret. In this paper, we propose a new
method to extract and explore significant fine-grained grammar patterns and
potential syntactic grammar rules from treebanks, in order to create an
easy-to-understand corpus-based grammar. More specifically, we extract
descriptions and rules across different languages for two linguistic phenomena,
agreement and word order, using a large search space and paying special
attention to the ranking order of the extracted rules. For that, we use a
linear classifier to extract the most salient features that predict the
linguistic phenomena under study. We associate statistical information to each
rule, and we compare the ranking of the model's results to those of other
quantitative and statistical measures. Our method captures both well-known and
less well-known significant grammar rules in Spanish, French, and Wolof.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要