MMDesign: Multi-Modality Transfer Learning for Generative Protein Design
CoRR(2023)
摘要
Protein design involves generating protein sequences based on their
corresponding protein backbones. While deep generative models show promise for
learning protein design directly from data, the lack of publicly available
structure-sequence pairings limits their generalization capabilities. Previous
efforts of generative protein design have focused on architectural improvements
and pseudo-data augmentation to overcome this bottleneck. To further address
this challenge, we propose a novel protein design paradigm called MMDesign,
which leverages multi-modality transfer learning. To our knowledge, MMDesign is
the first framework that combines a pretrained structural module with a
pretrained contextual module, using an auto-encoder (AE) based language model
to incorporate prior semantic knowledge of protein sequences. We also introduce
a cross-layer cross-modal alignment algorithm to enable the structural module
to learn long-term temporal information and ensure consistency between
structural and contextual modalities. Experimental results, only training with
the small CATH dataset, demonstrate that our MMDesign framework consistently
outperforms other baselines on various public test sets. To further assess the
biological plausibility of the generated protein sequences and data
distribution, we present systematic quantitative analysis techniques that
provide interpretability and reveal more about the laws of protein design.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要