Vygotsky Distance: Measure for Benchmark Task Similarity
CoRR(2024)
摘要
Evaluation plays a significant role in modern natural language processing.
Most modern NLP benchmarks consist of arbitrary sets of tasks that neither
guarantee any generalization potential for the model once applied outside the
test set nor try to minimize the resource consumption needed for model
evaluation. This paper presents a theoretical instrument and a practical
algorithm to calculate similarity between benchmark tasks, we call this
similarity measure "Vygotsky distance". The core idea of this similarity
measure is that it is based on relative performance of the "students" on a
given task, rather that on the properties of the task itself. If two tasks are
close to each other in terms of Vygotsky distance the models tend to have
similar relative performance on them. Thus knowing Vygotsky distance between
tasks one can significantly reduce the number of evaluation tasks while
maintaining a high validation quality. Experiments on various benchmarks,
including GLUE, SuperGLUE, CLUE, and RussianSuperGLUE, demonstrate that a vast
majority of NLP benchmarks could be at least 40
included. Most importantly, Vygotsky distance could also be used for the
validation of new tasks thus increasing the generalization potential of the
future NLP models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要