AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
This study showed that a deep learning model can be trained to detect wrist fractures in radiographs with diagnostic accuracy similar to that of senior subspecialized orthopedic surgeons

Deep neural network improves fracture detection by clinicians.

Proceedings of the National Academy of Sciences of the United States of America, (2018)

Cited by: 159|Views161
WOS EI

Abstract

Suspected fractures are among the most common reasons for patients to visit emergency departments (EDs), and X-ray imaging is the primary diagnostic tool used by clinicians to assess patients for fractures. Missing a fracture in a radiograph often has severe consequences for patients, resulting in delayed treatment and poor recovery of fu...More

Code:

Data:

0
Introduction
  • Suspected fractures are among the most common reasons for patients to visit emergency departments (EDs), and X-ray imaging is the primary diagnostic tool used by clinicians to assess patients for fractures.
  • Radiographic interpretation often takes place in environments without qualified colleagues available for second opinions [2]
  • Circumstances like those increase the risk of inaccurate identification of fractures on radiographs and often negatively impact patient care [3,4,5,6], especially in emergency departments, where missed fractures account for between 41 and 80% of reported diagnostic errors [5, 7, 8].
  • The authors demonstrate that when emergency medicine clinicians are provided with the assistance of the trained model, their ability to accurately detect fractures significantly improves
Highlights
  • Suspected fractures are among the most common reasons for patients to visit emergency departments (EDs), and X-ray imaging is the primary diagnostic tool used by clinicians to assess patients for fractures
  • We demonstrate that when emergency medicine clinicians are provided with the assistance of the trained model, their ability to accurately detect fractures significantly improves
  • On the subset of images in Test Set 2 where there is no uncertainty about the reference standard, the model achieved an area under the curve (AUC) of 0.994 (n = 1,243; 95% CI, 0.989–0.996)
  • This study showed that a deep learning model can be trained to detect wrist fractures in radiographs with diagnostic accuracy similar to that of senior subspecialized orthopedic surgeons
  • This study showed that, when emergency medicine clinicians are provided with the assistance of the trained model, their ability to detect wrist fractures can be significantly improved, diminishing diagnostic errors and improving the clinicians’ efficiency
  • The experiments described in this paper showed that the proposed model can be used to assist practicing clinicians and help improve their performance in identifying fractures in radiographs
Methods
  • For the purpose of model development, the authors retrospectively obtained a collection of radiographs from a specialty hospital in the United States.
  • A group of senior subspecialized orthopedic surgeons provided clinical interpretations for each radiograph in the collection.
  • The authors clinically tested the trained model’s performance on two test datasets: (i) a random subset of the development dataset’s wrist radiographs that had been withheld from the model during training and validation and (ii) a separate dataset of all wrist radiographs obtained from the same hospital over a 3-mo period.
  • To determine whether the trained model can help emergency medicine clinicians improve at fracture detection, the authors ran a controlled experiment with emergency medicine clinicians, in which the authors evaluated each clinician’s ability to detect fractures in wrist radiographs both with and without the availability of the model’s output while making their interpretations
Results
  • ROC curves for the trained model on the two test sets are shown in Fig. 3.
  • On the subset of images in Test Set 2 where there is no uncertainty about the reference standard, the model achieved an AUC of 0.994 (n = 1,243; 95% CI, 0.989–0.996).
  • This indicates a very high level of agreement between the model’s assessment of each radiograph and the senior subspecialized orthopedic hand surgeons who created the reference standard.
  • The model is generally able to precisely identify the presence and location of visible fractures
Conclusion
  • This study showed that a deep learning model can be trained to detect wrist fractures in radiographs with diagnostic accuracy similar to that of senior subspecialized orthopedic surgeons.
  • There are multiple factors that can contribute to radiographic misinterpretations of fractures by clinicians, including physician fatigue, lack of subspecialized expertise, and inconsistency among reading physicians [2, 4, 5, 24]
  • The approach of this investigation is to apply machine learning algorithms trained by experts in the field to less experienced clinicians to improve both their performance and efficiency.
  • The misinterpretation rate of the practicing emergency medicine clinicians was reduced by approximately one-half through the assistance of the model
Summary
  • Introduction:

    Suspected fractures are among the most common reasons for patients to visit emergency departments (EDs), and X-ray imaging is the primary diagnostic tool used by clinicians to assess patients for fractures.
  • Radiographic interpretation often takes place in environments without qualified colleagues available for second opinions [2]
  • Circumstances like those increase the risk of inaccurate identification of fractures on radiographs and often negatively impact patient care [3,4,5,6], especially in emergency departments, where missed fractures account for between 41 and 80% of reported diagnostic errors [5, 7, 8].
  • The authors demonstrate that when emergency medicine clinicians are provided with the assistance of the trained model, their ability to accurately detect fractures significantly improves
  • Methods:

    For the purpose of model development, the authors retrospectively obtained a collection of radiographs from a specialty hospital in the United States.
  • A group of senior subspecialized orthopedic surgeons provided clinical interpretations for each radiograph in the collection.
  • The authors clinically tested the trained model’s performance on two test datasets: (i) a random subset of the development dataset’s wrist radiographs that had been withheld from the model during training and validation and (ii) a separate dataset of all wrist radiographs obtained from the same hospital over a 3-mo period.
  • To determine whether the trained model can help emergency medicine clinicians improve at fracture detection, the authors ran a controlled experiment with emergency medicine clinicians, in which the authors evaluated each clinician’s ability to detect fractures in wrist radiographs both with and without the availability of the model’s output while making their interpretations
  • Results:

    ROC curves for the trained model on the two test sets are shown in Fig. 3.
  • On the subset of images in Test Set 2 where there is no uncertainty about the reference standard, the model achieved an AUC of 0.994 (n = 1,243; 95% CI, 0.989–0.996).
  • This indicates a very high level of agreement between the model’s assessment of each radiograph and the senior subspecialized orthopedic hand surgeons who created the reference standard.
  • The model is generally able to precisely identify the presence and location of visible fractures
  • Conclusion:

    This study showed that a deep learning model can be trained to detect wrist fractures in radiographs with diagnostic accuracy similar to that of senior subspecialized orthopedic surgeons.
  • There are multiple factors that can contribute to radiographic misinterpretations of fractures by clinicians, including physician fatigue, lack of subspecialized expertise, and inconsistency among reading physicians [2, 4, 5, 24]
  • The approach of this investigation is to apply machine learning algorithms trained by experts in the field to less experienced clinicians to improve both their performance and efficiency.
  • The misinterpretation rate of the practicing emergency medicine clinicians was reduced by approximately one-half through the assistance of the model
Funding
  • The project is funded by Imagen Technologies
Study subjects and analysis
senior subspecialized orthopedic surgeons: 18
In this work, we developed a deep neural network to detect and localize fractures in radiographs. We trained it to accurately emulate the expertise of 18 senior subspecialized orthopedic surgeons by having them annotate 135,409 radiographs. We then ran a controlled experiment with emergency medicine clinicians to evaluate their ability to detect fractures in wrist radiographs with and without the assistance of the deep learning model

clinicians: 3
The experiment followed a within-subjects design to evaluate the performance of a number of practicing emergency medicine clinicians on a sequence of 300 radiographs randomly chosen from Test Set 2, where the independent variable was whether or not the clinician could view the model’s predictions when interpreting the radiograph. All the clinicians were shown the same set of 300 radiographs, although the order of the radiographs was randomized per clinician; 266 of 300 radiographs had no disagreements among the three clinicians used to define the reference standard about the presence or absence of a fracture. We recruited 40 practicing emergency medicine clinicians, of whom 16 were physician assistants (PAs) and 24 were medical doctors (MDs)

practicing emergency medicine clinicians: 40
All the clinicians were shown the same set of 300 radiographs, although the order of the radiographs was randomized per clinician; 266 of 300 radiographs had no disagreements among the three clinicians used to define the reference standard about the presence or absence of a fracture. We recruited 40 practicing emergency medicine clinicians, of whom 16 were physician assistants (PAs) and 24 were medical doctors (MDs). Any clinician who had an across-condition sensitivity index (d score) of 0 ± 0.05 was dropped from the analysis

data: 3500
Results ROC curves for the trained model on the two test sets are shown in Fig. 3. On Test Set 1, the model achieved an AUC of 0.967 (n = 3,500; 95% CI, 0.960–0.973). On Test Set 2, the model achieved an AUC of 0.975 (n = 1,400; 95% CI, 0.965–0.982)

data: 1400
On Test Set 1, the model achieved an AUC of 0.967 (n = 3,500; 95% CI, 0.960–0.973). On Test Set 2, the model achieved an AUC of 0.975 (n = 1,400; 95% CI, 0.965–0.982). On the subset of images in Test Set 2 where there is no uncertainty about the reference standard (no interexpert disagreement), the model achieved an AUC of 0.994 (n = 1,243; 95% CI, 0.989–0.996)

data: 1243
On Test Set 2, the model achieved an AUC of 0.975 (n = 1,400; 95% CI, 0.965–0.982). On the subset of images in Test Set 2 where there is no uncertainty about the reference standard (no interexpert disagreement), the model achieved an AUC of 0.994 (n = 1,243; 95% CI, 0.989–0.996). This indicates a very high level of agreement between the model’s assessment of each radiograph and the senior subspecialized orthopedic hand surgeons who created the reference standard

data: 15
This is likely. Average unaided PA (n=15) 20 Average aided PA (n=15). Average unaided MD (n=24) 10 Average aided MD (n=24)

data: 24
Average unaided PA (n=15) 20 Average aided PA (n=15). Average unaided MD (n=24) 10 Average aided MD (n=24). Model (0.99 AUC)

Reference
  • Berlin L (2001) Defending the “missed” radiographic diagnosis. Am J Roentgenol 176:317–322.
    Google ScholarLocate open access versionFindings
  • Hallas P, Ellingsen T (2006) Errors in fracture diagnoses in the emergency department: Characteristics of patients and diurnal variation. BMC Emerg Med 6:4.
    Google ScholarLocate open access versionFindings
  • Kachalia A, et al. (2007) Missed and delayed diagnoses in the emergency department: A study of closed malpractice claims from 4 liability insurers. Ann Emerg Med 49:196– 205.
    Google ScholarLocate open access versionFindings
  • Wei CJ, et al. (2006) Systematic analysis of missed extremity fractures in emergency radiology. Acta Radiologica 47:710–717.
    Google ScholarLocate open access versionFindings
  • Guly HR (2001) Diagnostic errors in an accident and emergency department. Emerg Med J 18:263–269.
    Google ScholarLocate open access versionFindings
  • Whang JS, Baker SR, Patel R, Luk L, Castro A (2013) The causes of medical malpractice suits against radiologists in the United States. Radiology 266:548–554.
    Google ScholarLocate open access versionFindings
  • Williams SM, Connelly DJ, Wadsworth S, Wilson DJ (2000) Radiological review of accident and emergency radiographs: A 1-year audit. Clin Radiol 55:861–865.
    Google ScholarLocate open access versionFindings
  • Leeper WR, et al. (2013) The role of trauma team leaders in missed injuries: Does specialty matter? J Trauma Acute Care Surg 75:387–390.
    Google ScholarLocate open access versionFindings
  • Lehman C, et al. (2015) Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med 175:1828–1837.
    Google ScholarLocate open access versionFindings
  • Taylor P, Potts HW (2008) Computer aids and human second reading as interventions in screening mammography: Two systematic reviews to compare effects on cancer detection and recall rate. Eur J Cancer 44:798–807.
    Google ScholarLocate open access versionFindings
  • Khoo LA, Taylor P, Given-Wilson RM (2005) Computer-aided detection in the United Kingdom national breast screening programme: Prospective study. Radiology 237:444–449.
    Google ScholarLocate open access versionFindings
  • Azavedo E, Zackrisson S, Mejare I, Heibert Arnlind M (2012) Is single reading with computer-aided detection (CAD) as good as double reading in mammography screening? A systematic review. BMC Med Imaging 12:22.
    Google ScholarLocate open access versionFindings
  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444.
    Google ScholarLocate open access versionFindings
  • Gulshan V, et al. (2016) Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. J Am Med Assoc 304:649–656.
    Google ScholarLocate open access versionFindings
  • Esteva A, et al. (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:151–118.
    Google ScholarLocate open access versionFindings
  • Sirinukunwattana K, et al. (2016) Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Trans Med Imaging 35:1196–1206.
    Google ScholarLocate open access versionFindings
  • Ciresan DC, Giusti A, Gambardella LM, Schmidhuber J (2013) Mitosis detection in breast cancer histology images with deep neural networks. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2013, Lecture Notes in Computer Science, eds Mori K, Sakuma I, Sato Y, Barillot C, Navab N (Springer, Berlin), vol 8150, pp 411–418.
    Google ScholarLocate open access versionFindings
  • Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Lecture Notes in Computer Science, eds Navab N, Hornegger J, Wells W, Frangi A (Springer, Cham, Germany), vol 9351, pp 234– 241.
    Google ScholarLocate open access versionFindings
  • Kingma DP, Ba JL (2015) Adam: A method for stochastic optimization. Proceedings of the International Conference on Learning Representations 2015. arXiv:1412.6980v9. Preprint, posted December 22, 2014.
    Findings
  • DiCiccio TJ, Efron B (1996) Bootstrap confidence intervals. Stat Sci 11:189– 228.
    Google ScholarLocate open access versionFindings
  • Doyle AJ, Le Fevre J, Anderson GD (2005) Personal computer versus workstation display: Observer performance in detection of wrist fractures on digital radiographs. Radiology 237:872–877.
    Google ScholarLocate open access versionFindings
  • Espinosa JA, Nolan TW (2000) Reducing errors made by emergency physicians in interpreting radiographs: Longitudinal study. Br Med J 320:737–740.
    Google ScholarLocate open access versionFindings
  • Lufkin KC, Smith SW, Matticks CA, Brunette DD (1998) Radiologists’ review of radiographs interpreted confidently by emergency physicians infrequently leads to changes in patient management. Ann Emerg Med 31:202–207.
    Google ScholarLocate open access versionFindings
  • Juhl M, Møller-Madsen B, Jensen J (1990) Missed injuries in an orthopaedic department. Injury 21:110–112.
    Google ScholarLocate open access versionFindings
  • Kneusel RT, Mozer MC (2017) Improving human-machine cooperative visual search with soft highlighting. ACM Trans Appl Percept 15:1–21.
    Google ScholarLocate open access versionFindings
Author
Your rating :
0

 

Tags
Comments
小科