FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling

crossref(2024)

引用 0|浏览2
暂无评分
摘要
Assigning appropriate property labels, such as functional terms and catalytic activity, to proteins, remains a significant challenge, particularly for the non-homologous ones. In contrast to prior approaches that mostly focused on protein sequence features, we employ pretrained protein language model to encode the sequence features, and natural language model for the semantic information of property descriptions. Specifically, we present FAPM, a contrastive model between natural language and protein sequence language, which combines the pretrained protein sequence model with the pretrained large language model to generate labels such as GO functional terms and catalytic activity prediction in the format of natural language. Our result shows that FAPM has learned superior representations related to protein properties, outperforming protein sequence-based or structure-based models. Our model achieves state-of-the-art performance both on public benchmarks and on experimentally-annotated phage proteins that have limited known homologous sequences. Furthermore, we demonstrate the flexibility of our model to accept additional free text prompts as input, such as easily accessible taxonomy information, not only enhances its predictive performance but also improves the model's explainability. Our methodology presents a novel avenue for exploration, characterized by its potential to supersede current methodologies reliant on multiple sequence alignment for protein annotation.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要