Automated Screener Based on Convolutional Neural Network for Randomized Controlled Trials in Chinese Language: A Comparative Study of Different Classification Strategies

Shengkai Chen,Bochun Mao, Yu Xie,Pan Yao,Chunjie Li,Sijin Yang,Li Dong, Bo Li

crossref（2021）

引用 0|浏览0

暂无评分

摘要

Abstract Objective: To explore the influence of modified literature classification strategies of Chinese biomedical literature on an automated screener based on conventional algorithm.Methods: Citations of studies indexed as ‘Oral Science’ published in Chinese between 2014 and 2018 were retrieved from the China National Knowledge Infrastructure. Apart from dividing the studies into 2 categories (RCTs and non-RCTs), 3-category (RCTs, may-be-RCTs, and non-RCTs) and 5-category (RCTs, randomization-unclear controlled trials, non-randomized clinical trials/studies, non-clinical literature, and unclear) classification were also employed. The multi-category strategies took into consideration the diversity of study types and the presence of expression vagueness. Similar to real-world practice, full-text-needed studies included those that certainly concerned RCTs and those that might be RCTs but lacked information in their abstracts. Screening and classification were performed independently by 2 experienced researchers. The classification results after peer discussion and/or senior decision were used for the training of the CNN model. The probability thresholds for the classification of each category were set at a high sensitivity level.The area under the receiver-operator curve (AUC) was calculated when applicable. An isolated sample of citations was used in a prospective comparative trial that compared the sensitivity (SEN) and specificity (SPE) of screening RCTs, may-be-RCTs, and full-text-needed studies by using algorithms with different strategies and manual screening.Results:In total, 12,166 citations were used for CNN model training. All 3 training strategies performed well in RCTs-screening with AUCs being higher than 0.99. The training exhibited that, when screening for RCTs, the 5- and 3-category strategies can yield better performance than the 2-category strategy. When screening for may-be-RCTs and full text-needed studies, the 5-category model achieved better SENs while the 3-category model achieved higher SPEs. The comparative trial with 1,422 samples presented similar results.Conclusion: The CNN algorithm has promising results in the automatic screening of Chinese literature. The multi-category training strategies considering different study types and expression vagueness are more suitable for CNN training and can help achieve better screening sensitivity and specificity.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要