Estimating the Causal Effects of Natural Logic Features in Neural NLI Models
arxiv(2023)
摘要
Rigorous evaluation of the causal effects of semantic features on language
model predictions can be hard to achieve for natural language reasoning
problems. However, this is such a desirable form of analysis from both an
interpretability and model evaluation perspective, that it is valuable to zone
in on specific patterns of reasoning with enough structure and regularity to be
able to identify and quantify systematic reasoning failures in widely-used
models. In this vein, we pick a portion of the NLI task for which an explicit
causal diagram can be systematically constructed: in particular, the case where
across two sentences (the premise and hypothesis), two related words/terms
occur in a shared context. In this work, we apply causal effect estimation
strategies to measure the effect of context interventions (whose effect on the
entailment label is mediated by the semantic monotonicity characteristic) and
interventions on the inserted word-pair (whose effect on the entailment label
is mediated by the relation between these words.). Following related work on
causal analysis of NLP models in different settings, we adapt the methodology
for the NLI task to construct comparative model profiles in terms of robustness
to irrelevant changes and sensitivity to impactful changes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要