Designing meaningful continuous representations of T cell receptor sequences with deep generative models.

Nature communications(2024)

引用 0|浏览1
暂无评分
摘要
T Cell Receptor (TCR) antigen binding underlies a key mechanism of the adaptive immune response yet the vast diversity of TCRs and the complexity of protein interactions limits our ability to build useful low dimensional representations of TCRs. To address the current limitations in TCR analysis we develop a capacity-controlled disentangling variational autoencoder trained using a dataset of approximately 100 million TCR sequences, that we name TCR-VALID. We design TCR-VALID such that the model representations are low-dimensional, continuous, disentangled, and sufficiently informative to provide high-quality TCR sequence de novo generation. We thoroughly quantify these properties of the representations, providing a framework for future protein representation learning in low dimensions. The continuity of TCR-VALID representations allows fast and accurate TCR clustering and is benchmarked against other state-of-the-art TCR clustering tools and pre-trained language models.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要