Machine Learning Models to Interrogate Proteome-wide Cysteine Ligandabilities

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 1|浏览2
暂无评分
摘要
Abstract Machine learning (ML) identification of covalently ligandable sites may significantly accelerate targeted covalent inhibitor discoveries and expand the druggable proteome space. Here we report the development of the tree-based models and convolutional neural networks trained on a newly curated database (LigCys3D) of over 1,000 liganded cysteines in nearly 800 proteins represented by over 10,000 X-ray structures as reported in the protein data bank (PDB). The unseen tests yielded 94% AUC (area under the receiver operating characteristic curve), demonstrating the highly predictive power of the models. Interestingly, application to the proteins evaluated by the activity-based protein profiling (ABPP) experiments in cell lines gave a lower AUC of 72%. Analysis revealed significant discrepancies in the structural environment of the ligandable cysteines captured by X-ray crystallography and those determined by ABPP. This surprising finding warrants further investigations and may have implications for future drug discoveries. We discuss ways to improve the models and project future directions. Our work represents a first step towards the ML-led integration of big genome data, structure models, and chemoproteomic experiments to annotate the human proteome space for the nextgeneration drug discoveries.
更多
查看译文
关键词
cysteine,machine learning,models,proteome-wide
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要