Sequence vs. structure: delving deep into data driven protein function prediction

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 1|浏览65
暂无评分
摘要
Predicting protein function is a longstanding challenge that has significant scientific implications. The success of amino acid sequence-based learning methods depends on the relationship between sequence, structure, and function. However, recent advances in AlphaFold have led to highly accurate protein structure data becoming more readily available, prompting a fundamental question: given sufficient experimental and predicted structures, should we use structure-based learning methods instead of sequence-based learning methods for predicting protein function, given the intuition that a protein's structure has a closer relationship to its function than its amino acid sequence? To answer this question, we explore several key factors that affect function prediction accuracy. Firstly, we learn protein representations using state-of-the-art graph neural networks (GNNs) and compare graph construction(GC) methods at the residue and atomic levels. Secondly, we investigate whether protein structures generated by AlphaFold are as effective as experimental structures for function prediction when protein graphs are used as input. Finally, we compare the accuracy of sequence-only, structure-only, and sequence-structure fusion-based learning methods for predicting protein function. Additionally, we make several observations, provide useful tips, and share code and datasets to encourage further research and enhance reproducibility. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
protein,prediction,structure,function,data-driven
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要