Exploring structural diversity across the protein universe with The Encyclopedia of Domains

biorxiv(2024)

引用 0|浏览7
暂无评分
摘要
The AlphaFold Protein Structure Database (AFDB) contains full-length predictions of the three-dimensional structures of almost every protein in UniProt. Because protein function is closely linked to structure, the AFDB is poised to revolutionise our understanding of biology, evolution and more. Protein structures are composed of domains, independently folding units that can be found in multiple structural contexts and functional roles. The AFDB’s potential remains untapped due to the difficulty of characterising 200 million structures. Here we present The Encyclopedia of Domains or TED, which combines state-of-the-art deep learning-based domain parsing and structure comparison algorithms to segment and classify domains across the whole AFDB. TED describes over 370 million domains, over 100 million more than detectable by sequence-based methods. Nearly 80% of TED domains share similarities to known superfamilies in CATH, greatly expanding the set of known protein structural domains. We uncover over 10,000 previously unseen structural interactions between superfamilies, expand domain coverage to over 1 million taxa, and unveil thousands of architectures and folds across the unexplored continuum of protein fold space. We expect TED to be a valuable resource that provides a functional interface to the AFDB, empowering it to be useful for a multitude of downstream analyses. ### Competing Interest Statement The authors have declared no competing interest. The Encyclopaedia of Domains (TED) structural domain assignments for AlphaFold Database v4 will be available as a Zenodo deposition at (DOI: 10.5281/zenodo.10848710) upon publication. The deposition contains domain assignments for TED, PDB files for novel folds and individual domain assignments from Chainsaw, Merizo and UniDoc to facilitate further benchmarking efforts.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要