Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
EMNLP 2020, pp. 9275-9293, 2020.
Large datasets have become commonplace in NLP research. However, the increased emphasis on data quantity has made it challenging to assess the quality of data. We introduce Data Maps—a model-based tool to characterize and diagnose datasets. We leverage a largely ignored source of information: the behavior of the model on individual instan...More
PPT (Upload PPT)