Predicting FDA approvability of small-molecule drugs
biorxiv(2022)
摘要
A high rate of compound attrition makes drug discovery via conventional methods time-consuming and expensive. Here, we showed that machine learning models can be trained to classify compounds into distinctive groups according to their status in the drug development process, which can significantly reduce the compound attrition rate. Using molecular structure fingerprints and physicochemical properties as input, our models accurately predicted which drug compounds would proceed to trial, with an area under the receiver operating curve (AUC) of 0.94 ± 0.01 (mean ± standard deviation). Our models also identified which drugs in clinical trials would be approved by the US Food and Drug Administration (FDA) to go on the market, with an AUC of 0.73 ± 0.02. The predictive power of our models could reduce the attrition rate of preclinical compounds to enter clinical trials from 65%, as with conventional methods, to 12% (with 92% sensitivity) and the clinical trial failure rate from 80–90% to 29% (with 83% sensitivity). The results largely held in additional tests on new clinical trial compounds and new FDA-approved drugs, as well as on drugs uniquely approved for use in Europe and Japan.
SIGNIFICANCE STATEMENT The odds of developing a drug approved by the US Food and Drug Administration (FDA) are slim, meaning that the vast majority of drug candidates would fail tests for safety and efficacy in the drug discovery process, rendering it highly inefficient and costly. Here, we have developed machine learning models to predict drug compounds worthy of clinical trials with high accuracy, and clinical-trial compounds to receive FDA approval with a much higher success rate than that achieved by the traditional approach. Our computational prediction requires input of only the drug compound’s chemical structure and physicochemical properties. It can help mitigate the long-standing problem of drug discovery.
### Competing Interest Statement
The authors have declared no competing interest.
* FDA
: Food and Drug Administration
MACCS
: Molecular Access System
MF
: molecular fingerprint
ML
: machine learning
PCP
: physicochemical property
REOS
: Rapid Elimination of Swill
t-SNE
: t-stochastic neighbor embedding
TD
: toxic compound
CD
: drug in clinical trial
MD
: drug approved by FDA
MDoff
: drug withdrawn from the market
MDon
: drug currently on the market
更多查看译文
关键词
fda approvability,drugs,small-molecule
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要