Host transcriptomics and machine learning for secondary bacterial infections in patients with COVID-19: a prospective, observational cohort study

LANCET MICROBE(2024)

引用 0|浏览9
暂无评分
摘要
Background Viral respiratory tract infections are frequently complicated by secondary bacterial infections. This study aimed to use machine learning to predict the risk of bacterial superinfection in SARS-CoV-2-positive individuals. Methods In this prospective, multicentre, observational cohort study done in nine centres in six countries (Australia, Indonesia, Singapore, Italy, Czechia, and France) blood samples and RNA sequencing were used to develop a robust model of predicting secondary bacterial infections in the respiratory tract of patients with COVID-19. Eligible participants were older than 18 years, had known or suspected COVID-19, and symptoms of a recent respiratory infection. A control cohort of participants without COVID-19 who were older than 18 years and with no infection symptoms was also recruited from one Australian centre. In the pre -analysis phase, data were filtered to include only individuals with complete blood transcriptomics and patient data (ie, age, sex, location, and WHO severity score at the time of sample collection). The dataset was then divided randomly (4:1) into a training set (80%) and a test set (20%). Gene expression data in the training set and control cohort were used for differential expression analysis. Differentially expressed genes, along with WHO severity score, location, age, and sex, were used for feature selection with least absolute shrinkage and selection operator (LASSO) in the training set. For LASSO analysis, samples were excluded if gene expression data were not obtained at study admission, no longitudinal clinical information was available, a bacterial infection at the time of study admission was present, or a fungal infection in the absence of a bacterial infection was detected. LASSO regression was performed using three subsets of predictor variables: patient data alone, gene expression data alone, or a combination of patient data and gene expression data. The accuracy of the resultant models was tested on data from the test set. Findings Between March, 2020, and October, 2021, we recruited 536 SARS-CoV-2-positive individuals and between June, 2013, and January, 2020, we recruited 74 participants into the control cohort. After prefiltering analysis and other exclusions, samples from 158 individuals were analysed in the training set and 47 in the test set. The expression of seven host genes (DAPP1, CST3, FGL2, GCH1, CIITA, UPP1, and RN7SL1) in the blood at the time of study admission was identified by LASSO as predictive of the risk of developing a secondary bacterial infection of the respiratory tract more than 24 h after study admission. Specifically, the expression of these genes in combination with a patient's WHO severity score at the time of study enrolment resulted in an area under the curve of 0.98 (95% CI 0.89-1.00), a true positive rate (sensitivity) of 1.00 (95% CI 1.00-1.00), and a true negative rate (specificity) of 0.94 (95% CI 0.89-1.00) in the test cohort. The combination of patient data and host transcriptomics at hospital admission identified all seven individuals in the training and test sets who developed a bacterial infection of the respiratory tract 5-9 days after hospital admission. Interpretation These data raise the possibility that host transcriptomics at the time of clinical presentation, together with machine learning, can forward predict the risk of secondary bacterial infections and allow for the more targeted use of antibiotics in viral infection. Copyright (c) 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要