谷歌浏览器插件
订阅小程序
在清言上使用

InstructPLM: Aligning Protein Language Models to Follow Protein Structure Instructions

crossref(2024)

引用 0|浏览97
暂无评分
摘要
Large language models are renowned for their efficacy in capturing intricate patterns, including co-evolutionary relationships, and underlying protein languages. However, current methodologies often fall short in illustrating the emergence of genomic insertions, duplications, and insertion/deletions (indels), which account for approximately 14% of human pathogenic mutations. Given that structure dictates function, mutated proteins with similar structures are more likely to persist throughout biological evolution. Motivated by this, we leverage crossmodality alignment and instruct fine-tuning techniques inspired by large language models to align a generative protein language model with protein structure instructions. Specifically, we present a method for generating variable-length and diverse proteins to explore and simulate the complex evolution of life, thereby expanding the repertoire of options for protein engineering. Our proposed protein LM-based approach, InstructPLM, demonstrates significant performance enhancements both in silico and in vitro. On native protein backbones, it achieves a perplexity of 2.68 and a sequence recovery rate of 57.51, surpassing Protein-MPNN by 39.2% and 25.1%, respectively. Furthermore, we validate the efficacy of our model by redesigning PETase and L-MDH. For PETase, all fifteen designed variable-length PETase exhibit depolymerization activity, with eleven surpassing the activity levels of the wild type. Regarding L-MDH, an enzyme lacking an experimentally determined structure, InstructPLM is able to design functional enzymes with an AF2-predicted structure. Code and model weights of InstructPLM are publicly available[*][1]. ### Competing Interest Statement The authors have declared no competing interest. [1]: #fn-2
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要