谷歌浏览器插件
订阅小程序
在清言上使用

CRISPRCasStack: a Stacking Strategy-Based Ensemble Learning Framework for Accurate Identification of Cas Proteins.

Briefings in bioinformatics(2022)

引用 2|浏览8
暂无评分
摘要
CRISPR-Cas system is an adaptive immune system widely found in most bacteria and archaea to defend against exogenous gene invasion. One of the most critical steps in the study of exploring and classifying novel CRISPR-Cas systems and their functional diversity is the identification of Cas proteins in CRISPR-Cas systems. The discovery of novel Cas proteins has also laid the foundation for technologies such as CRISPR-Cas-based gene editing and gene therapy. Currently, accurate and efficient screening of Cas proteins from metagenomic sequences and proteomic sequences remains a challenge. For Cas proteins with low sequence conservation, existing tools for Cas protein identification based on homology cannot guarantee identification accuracy and efficiency. In this paper, we have developed a novel stacking-based ensemble learning framework for Cas protein identification, called CRISPRCasStack. In particular, we applied the SHAP (SHapley Additive exPlanations) method to analyze the features used in CRISPRCasStack. Sufficient experimental validation and independent testing have demonstrated that CRISPRCasStack can address the accuracy deficiencies and inefficiencies of the existing state-of-the-art tools. We also provide a toolkit to accurately identify and analyze potential Cas proteins, Cas operons, CRISPR arrays and CRISPR-Cas locus in prokaryotic sequences. The CRISPRCasStack toolkit is available at https://github.com/yrjia1015/CRISPRCasStack.
更多
查看译文
关键词
CRISPR-Cas system,stacking strategy,machine learning,Cas proteins identification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要