Crowdsourcing Dermatology Images with Google Search Ads: Creating a Real-World Skin Condition Dataset
CoRR(2024)
摘要
Background: Health datasets from clinical sources do not reflect the breadth
and diversity of disease in the real world, impacting research, medical
education, and artificial intelligence (AI) tool development. Dermatology is a
suitable area to develop and test a new and scalable method to create
representative health datasets.
Methods: We used Google Search advertisements to invite contributions to an
open access dataset of images of dermatology conditions, demographic and
symptom information. With informed contributor consent, we describe and release
this dataset containing 10,408 images from 5,033 contributions from internet
users in the United States over 8 months starting March 2023. The dataset
includes dermatologist condition labels as well as estimated Fitzpatrick Skin
Type (eFST) and Monk Skin Tone (eMST) labels for the images.
Results: We received a median of 22 submissions/day (IQR 14-30). Female
(66.72
the dataset compared to the US population, and 32.6
non-White racial or ethnic identity. Over 97.5
images of skin conditions. Dermatologist confidence in assigning a differential
diagnosis increased with the number of available variables, and showed a weaker
correlation with image sharpness (Spearman's P values <0.001 and 0.01
respectively). Most contributions were short-duration (54
ago ) and 89
eMST distributions reflected the geographical origin of the dataset. The
dataset is available at github.com/google-research-datasets/scin .
Conclusion: Search ads are effective at crowdsourcing images of health
conditions. The SCIN dataset bridges important gaps in the availability of
representative images of common skin conditions.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要