Generative Data Augmentation for Commonsense Reasoning

EMNLP(2020)

引用 172|浏览741
暂无评分
摘要
Recent advances in commonsense reasoning depend on large-scale human-annotated training sets to achieve peak performance. However, manual curation of training sets is expensive and has been shown to introduce annotation artifacts that neural models can readily exploit and overfit to. We propose a novel generative data augmentation technique, G-DAUGˆC, that aims to achieve more accurate and robust learning in a low-resource setting. Our approach generates synthetic examples using pretrained language models and selects the most informative and diverse set of examples for data augmentation. On experiments with multiple commonsense reasoning benchmarks, G-DAUGˆC consistently outperforms existing data augmentation methods based on back-translation, establishing a new state-of-the-art on WinoGrande, CODAH, and CommonsenseQA, as well as enhances out-of-distribution generalization, proving to be robust against adversaries or perturbations. Our analysis demonstrates that G-DAUGˆC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
更多
查看译文
关键词
commonsense reasoning,generative data augmentation,g-daug
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要