Range of Radiologist Performance in a Population-based Screening Cohort of 1 Million Digital Mammography Examinations.

Mattie Salim,Karin Dembrower,Martin Eklund,Peter Lindholm,Fredrik Strand

Radiology（2020）

引用 18|浏览11

暂无评分

摘要

Background: There is great interest in developing artificial intelligence (AI)-based computer-aided detection (CAD) systems for use in screening mammography. Comparative performance benchmarks from true screening cohorts are needed. Purpose: To determine the range of human first-reader performance measures within a population-based screening cohort of 1 million screening mammograms to gauge the performance of emerging AI CAD systems. Materials and Methods: This retrospective study consisted of all screening mammograms in women aged 40-74 years in Stockholm County, Sweden, who underwent screening with full-field digital mammography between 2008 and 2015. There were 110 interpreting radiologists, of whom 24 were defined as high-volume readers (ie, those who interpreted more than 5000 annual screening mammograms). A true-positive finding was defined as the presence of a pathology-confirmed cancer within 12 months. Performance benchmarks included sensitivity and specificity, examined per quartile of radiologists' performance. First-reader sensitivity was determined for each tumor subgroup, overall and by quartile of high-volume reader sensitivity. Screening outcomes were examined based on the first reader's sensitivity quartile with 10 000 screening mammograms per quartile. Linear regression models were fitted to test for a linear trend across quartiles of performance. Results: A total of 418 041 women (mean age, 54 years 6 10 [standard deviation]) were included, and 1 186 045 digital mammograms were evaluated, with 972 899 assessed by high-volume readers. Overall sensitivity was 73% (95% confidence interval [CI]: 69%, 77%), and overall specificity was 96% (95% CI: 95%, 97%). The mean values per quartile of high-volume reader performance ranged from 63% to 84% for sensitivity and from 95% to 98% for specificity. The sensitivity difference was very large for basal cancers, with the least sensitive and most sensitive high-volume readers detecting 53% and 89% of cancers, respectively (P < .001). Conclusion: Benchmarks showed a wide range of performance differences between high-volume readers. Sensitivity varied by tumor characteristics. (C) RSNA, 2020

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要