Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
EMNLP 2020, pp. 9275-9293, 2020.
Abstract:
Large datasets have become commonplace in NLP research. However, the increased emphasis on data quantity has made it challenging to assess the quality of data. We introduce Data Maps—a model-based tool to characterize and diagnose datasets. We leverage a largely ignored source of information: the behavior of the model on individual instan...More
Code:
Data:
Full Text
Tags
Comments