Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

EMNLP 2020, pp. 9275-9293, 2020.

Other Links: arxiv.org|academic.microsoft.com

Abstract:

Large datasets have become commonplace in NLP research. However, the increased emphasis on data quantity has made it challenging to assess the quality of data. We introduce Data Maps—a model-based tool to characterize and diagnose datasets. We leverage a largely ignored source of information: the behavior of the model on individual instan...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments