Data Set Terminology of Artificial Intelligence in Medicine: A Historical Review and Recommendation
arxiv(2024)
摘要
Medicine and artificial intelligence (AI) engineering represent two distinct
fields each with decades of published history. With such history comes a set of
terminology that has a specific way in which it is applied. However, when two
distinct fields with overlapping terminology start to collaborate,
miscommunication and misunderstandings can occur. This narrative review aims to
give historical context for these terms, accentuate the importance of clarity
when these terms are used in medical AI contexts, and offer solutions to
mitigate misunderstandings by readers from either field. Through an examination
of historical documents, including articles, writing guidelines, and textbooks,
this review traces the divergent evolution of terms for data sets and their
impact. Initially, the discordant interpretations of the word 'validation' in
medical and AI contexts are explored. Then the data sets used for AI evaluation
are classified, namely random splitting, cross-validation, temporal,
geographic, internal, and external sets. The accurate and standardized
description of these data sets is crucial for demonstrating the robustness and
generalizability of AI applications in medicine. This review clarifies existing
literature to provide a comprehensive understanding of these classifications
and their implications in AI evaluation. This review then identifies often
misunderstood terms and proposes pragmatic solutions to mitigate terminological
confusion. Among these solutions are the use of standardized terminology such
as 'training set,' 'validation (or tuning) set,' and 'test set,' and explicit
definition of data set splitting terminologies in each medical AI research
publication. This review aspires to enhance the precision of communication in
medical AI, thereby fostering more effective and transparent research
methodologies in this interdisciplinary field.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要