Folding Large Proteins by Ultra-Deep Learning

BCB(2017)

引用 1|浏览21
暂无评分
摘要
Ab initio protein folding is one of the most challenging problems in computational biology. The popular fragment assembly method mainly can only fold some small proteins. Recently contact-assisted folding has made some progress, but it requires accurate contact prediction, which by existing methods can only be achieved on some proteins with a very large number (>500 or 1000) of sequence homologs. To deal with proteins without so many sequence homologs, we have developed a novel deep learning model for contact prediction by concatenating two deep residual neural networks (ResNet), which performed the best in 2015 computer vision challenges. The first ResNet conducts convolutional transformation of 1-dimensional features and the second conducts convolutional transformation of 2-dimensional information including output of the first one. Experimental results suggest that our deep learning method greatly outperforms existing contact prediction methods and doubles the accuracy of pure co-evolution methods on proteins without many sequence homologs. Our method is ranked 1st in terms of the total F1 score in the latest CASP competition (i.e., CASP12), although back then (May-July 2016) our method was not fully implemented. Our predicted contacts also lead to much more accurate contact-assisted folding. Blindly tested in the weekly benchmark CAMEO (which can be interpreted as fully-automated CASP) since October 2016, our fully-automated web server implementing this method successfully folded many large hard targets (up to 600 residues) without good templates and many sequence homologs. Our large-scale benchmark indicates that ab initio folding (based upon predicted contacts) now can correctly fold more than 2/3 of randomly-chosen proteins. We have also applied this method to membrane protein contact prediction, which produces very good results in terms of both contact prediction accuracy and folding. An important finding is that even trained by only non-membrane proteins, our deep model works very well on membrane protein contact prediction and folding. This is because our deep model learns to predict contacts by making use of contact occurrence patterns (which are shared between membrane and non-membrane proteins) instead of sequence similarity. This method can also be extended to protein-protein interaction prediction, protein complex prediction and protein docking. Our web server implementing this method is publicly available at http://raptorx.uchicago.edu/ContactMap/ . For technical and result details, please see our papers [1-2].
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要