Post-Translational Modification Prediction via Prompt-Based Fine-Tuning of a GPT-2 Model

crossref(2024)

引用 0|浏览1
暂无评分
摘要
Abstract Post-translational modifications (PTMs) are pivotal in modulating protein functions, influencing key cellular processes such as signaling, localization, and protein degradation. The complexity of these biological interactions necessitates efficient predictive methodologies. In this work, we introduce PTMGPT2, an interpretable protein language model that utilizes prompt-based fine-tuning to improve its accuracy and generalizability in precisely predicting PTMs. Drawing inspiration from recent advancements in GPT-based architectures, PTMGPT2 adopts an unsupervised learning approach to identify PTMs. It utilizes a custom prompt to guide the model through the subtle linguistic patterns encoded in amino acid sequences, generating tokens indicative of PTM sites. To provide interpretability, we visualize attention profiles from the model’s final decoder layer, elucidating sequence motifs essential for molecular recognition and modification variability. Furthermore, we conducted analyses to investigate the effects of mutations at or near PTM sites, thereby offering deeper insights into protein functionality. Our analysis encompasses a comprehensive dataset comprising 3,88,084 modification sites across 19 distinct PTM types, facilitating the identification of novel PTM sites. Comparative assessments reveal that PTMGPT2 outperforms existing methods by an average 5.45% in MCC, underscoring its potential in identifying novel therapeutic strategies, disease associations, and drug targets.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要