AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We find that learning an energy function from the data of protein crystal structures automatically discovers features relevant to computing molecular energies; and we observe that the model responds to its inputs in ways that are consistent with an intuitive understanding of prot...

Energy-based models for atomic-resolution protein conformations

ICLR, (2020)

Cited by: 24|Views29
EI
Full Text
Bibtex
Weibo

Abstract

We propose an energy-based model (EBM) of protein conformations that operates at atomic scale. The model is trained solely on crystallized protein data. By contrast, existing approaches for scoring conformations use energy functions that incorporate knowledge of physical principles and features that are the complex product of several deca...More
0
Introduction
  • Methods for the rational design of proteins make use of complex energy functions that approximate the physical forces that determine protein conformations (Cornell et al, 1995; Jorgensen et al, 1996; MacKerell Jr et al, 1998), incorporating knowledge about statistical patterns in databases of protein crystal structures (Boas & Harbury, 2007).
  • The physical approximations and knowledge-derived features that are included in protein design energy functions have been developed over decades, building on results from a large community of researchers (Alford et al, 2017).
  • In this work1, the authors investigate learning an energy function for protein conformations directly from protein crystal structure data.
  • The major degrees of freedom in protein conformation are the dihedral rotations (Richardson & Richardson, 1989), about the backbone bonds termed phi (φ) and psi (ψ) angles, and the dihedral rotations about the side chain bonds termed chi (χ) angles
Highlights
  • Methods for the rational design of proteins make use of complex energy functions that approximate the physical forces that determine protein conformations (Cornell et al, 1995; Jorgensen et al, 1996; MacKerell Jr et al, 1998), incorporating knowledge about statistical patterns in databases of protein crystal structures (Boas & Harbury, 2007)
  • We find that a single model evaluated on the benchmark performs slightly worse than both versions of the Rosetta energy function
  • In this work we explore the possibility of learning an energy function of protein conformations at atomic resolution
  • We develop and evaluate the method in the benchmark problem setting of recovering protein side chain conformations from their native context, finding that a learned energy function nears the performance in this restricted domain to energy functions that have been developed through years of research into approximation of the physical forces guiding protein conformation and engineering of statistical terms
  • We find that learning an energy function from the data of protein crystal structures automatically discovers features relevant to computing molecular energies; and we observe that the model responds to its inputs in ways that are consistent with an intuitive understanding of protein conformation and energy
  • Huang et al (2016) have argued that since the physical principles that govern protein conformation apply to all proteins, encoding knowledge of these physical and biochemical principles into an energy function will make it possible to design de novo new protein structures and functions that have not appeared before in nature
Methods
  • The authors' goal is to score molecular configurations of the protein side chains given a fixed target backbone structure.
  • The model calculates scalar functions, fθ(A), of size-k subsets, A, of atoms within a protein.
  • Selection of atom subsets In the experiments, the authors choose A to be nearest-neighbor sets around the residues of the protein and set k = 64.
  • The authors construct A to be the k atoms that are nearest to the residue’s beta carbon.
  • The coordinates are normalized to have zero mean across the k
Results
  • Results are compared to Rosetta

    The authors ran Rosetta using score12 and and ref15 energy functions using the rotamer trials and rtmin protocols with default settings.
  • Results are compared to Rosetta.
  • Table 1 directly compares the EBM model with two versions of the Rosetta energy function.
  • The authors run Rosetta on the set of 152 proteins from the benchmark of Leaver-Fay et al (2013).
  • The authors include published performance on the same test set from Leaver-Fay et al (2013).
  • The authors find that a single model evaluated on the benchmark performs slightly worse than both versions of the Rosetta energy function.
Conclusion
  • In this work the authors explore the possibility of learning an energy function of protein conformations at atomic resolution.
  • The authors find that learning an energy function from the data of protein crystal structures automatically discovers features relevant to computing molecular energies; and the authors observe that the model responds to its inputs in ways that are consistent with an intuitive understanding of protein conformation and energy.
  • Huang et al (2016) have argued that since the physical principles that govern protein conformation apply to all proteins, encoding knowledge of these physical and biochemical principles into an energy function will make it possible to design de novo new protein structures and functions that have not appeared before in nature
  • To create new proteins outside the space of those discovered by nature, it is necessary to use design principles that generalize to all proteins. Huang et al (2016) have argued that since the physical principles that govern protein conformation apply to all proteins, encoding knowledge of these physical and biochemical principles into an energy function will make it possible to design de novo new protein structures and functions that have not appeared before in nature
Tables
  • Table1: Rotamer recovery of energy functions under the discrete rotamer sampling method detailed in Section 4.2.1. Parentheses denote value reported by Leaver-Fay et al (2013)
  • Table2: Rotamer recovery of energy functions under continuous optimization schemes. Rosetta continuous optimization is performed with the rtmin protocol. Parentheses denote value reported by Leaver-Fay et al (2013)
  • Table3: Comparison of rotamer recovery rates by amino acid between Rosetta and the ensembled energybased model under discrete rotamer sampling. The model appears to perform well on polar amino acids glutamine, serine, asparagine, and threonine, while Rosetta performs better on larger amino acids phenylalanine, tyrosine, and tryptophan and the common amino acid, leucine. The numbers reported for Rosetta are from Leaver-Fay et al (2013)
Download tables as Excel
Related work
  • Energy functions have been widely used in the modeling of protein conformations and the design of protein sequences and structures (Boas & Harbury, 2007). Rosetta, for example, uses a combination of physically motivated terms and knowledge-based potentials (Alford et al, 2017) to model proteins and other macromolecules. Leaver-Fay et al (2013) proposed optimizing the feature weights and parameters of the terms of an energy function for protein design; however their method used physical features designed with expert knowledge and data analysis. Our work draws on their development of rigorous benchmarks for energy functions, but in contrast automatically learns complex features from data. Neural networks have also been explored for protein folding. Xu (2018) developed a deep residual network that predicts the pairwise distances between residues in the protein structure from evolutionary covariation information. Senior et al (2018) used evolutionary covariation to predict pairwise distance distributions, using maximization of the probability of the backbone structure with respect to the predicted distance distribution to fold the protein. Ingraham et al (2018) proposed learning an energy function for protein folding by backpropagating through a differentiable simulator. AlQuraishi (2019) investigated predicting protein structure from sequence without using co-evolution.
Funding
  • Alexander Rives was supported by NSF Grant #1339362
Reference
  • Rebecca F Alford, Andrew Leaver-Fay, Jeliazko R Jeliazkov, Matthew J O’Meara, Frank P DiMaio, Hahnbeom Park, Maxim V Shapovalov, P Douglas Renfrew, Vikram K Mulligan, Kalli Kappel, et al. The rosetta all-atom energy function for macromolecular modeling and design. Journal of chemical theory and computation, 13(6):3031–3048, 2017.
    Google ScholarLocate open access versionFindings
  • Ethan C. Alley, Grigory Khimulya, Surojit Biswas, Mohammed AlQuraishi, and George M. Church. Unified rational protein engineering with sequence-only deep representation learning. bioRxiv, 2019. doi: 10.1101/ 589333. URL https://www.biorxiv.org/content/early/2019/03/26/589333.
    Locate open access versionFindings
  • Mohammed AlQuraishi. End-to-end differentiable learning of protein structure. Cell Systems, 8(4):292 – 301.e3, 2019. ISSN 2405-4712. doi: https://doi.org/10.1016/j.cels.2019.03.006. URL http://www.sciencedirect.com/science/article/pii/S2405471219300766.
    Locate open access versionFindings
  • Xavier I Ambroggio and Brian Kuhlman. Computational design of a single amino acid sequence that can switch between two distinct protein folds. Journal of the American Chemical Society, 128(4):1154–1161, 2006.
    Google ScholarLocate open access versionFindings
  • Namrata Anand and Possu Huang. Generative modeling for protein structures. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer Normalization. arXiv e-prints, art. arXiv:1607.06450, Jul 2016.
    Findings
  • Tristan Bepler and Bonnie Berger. Learning protein sequence embeddings using information from structure. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • F Edward Boas and Pehr B Harbury. Potential energy functions for protein design. Current opinion in structural biology, 17(2):199–204, 2007.
    Google ScholarLocate open access versionFindings
  • Michael J Bower, Fred E Cohen, and Roland L Dunbrack Jr. Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. Journal of molecular biology, 267 (5):1268–1282, 1997.
    Google ScholarLocate open access versionFindings
  • Scott E Boyken, Zibo Chen, Benjamin Groves, Robert A Langan, Gustav Oberdorfer, Alex Ford, Jason M Gilmore, Chunfu Xu, Frank DiMaio, Jose Henrique Pereira, et al. De novo design of protein homooligomers with modular hydrogen-bond network–mediated specificity. Science, 352(6286):680–687, 2016.
    Google ScholarLocate open access versionFindings
  • Wendy D Cornell, Piotr Cieplak, Christopher I Bayly, Ian R Gould, Kenneth M Merz, David M Ferguson, David C Spellmeyer, Thomas Fox, James W Caldwell, and Peter A Kollman. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. Journal of the American Chemical Society, 117(19):5179–5197, 1995.
    Google ScholarLocate open access versionFindings
  • R. Das. Four small puzzles that Rosetta doesn’t solve. PLoS ONE, 6(5):e20044, 2011.
    Google ScholarLocate open access versionFindings
  • Peter Dayan, Geoffrey E Hinton, Radford M Neal, and Richard S Zemel. The helmholtz machine. Neural computation, 7(5):889–904, 1995.
    Google ScholarLocate open access versionFindings
  • Ken A Dill. Dominant forces in protein folding. Biochemistry, 29(31):7133–7155, 1990.
    Google ScholarLocate open access versionFindings
  • Yilun Du and Igor Mordatch. Implicit generation and generalization in energy-based models. arXiv 1903.08689, 2019.
    Findings
  • Melissa A Edeling, Luke W Guddat, Renata A Fabianek, Linda Thony-Meyer, and Jennifer L Martin. Structure of ccmg/dsbe at 1.14 aresolution: high-fidelity reducing activity in an indiscriminately oxidizing environment. Structure, 10(7):973–979, 2002.
    Google ScholarLocate open access versionFindings
  • Evan N. Feinberg, Debnil Sur, Zhenqin Wu, Brooke E. Husic, Huanghao Mai, Yang Li, Saisai Sun, Jianyi Yang, Bharath Ramsundar, and Vijay S. Pande. Potentialnet for molecular property prediction. ACS Central Science, 4(11):1520–1530, Nov 2018. ISSN 2374-7943. doi: 10.1021/acscentsci.8b00507. URL https://doi.org/10.1021/acscentsci.8b00507.
    Locate open access versionFindings
  • Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 1263–1272, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL http://proceedings.mlr.press/v70/gilmer17a.html.
    Locate open access versionFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (eds.), Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates, Inc., 2014. URL http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf.
    Locate open access versionFindings
  • Hong Guo and Dennis R Salahub. Cooperative hydrogen bonding and enzyme catalysis. Angewandte Chemie International Edition, 37(21):2985–2990, 1998.
    Google ScholarLocate open access versionFindings
  • Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006.
    Google ScholarLocate open access versionFindings
  • Lisa Holm and Chris Sander. Fast and simple monte carlo algorithm for side chain optimization in proteins: application to model building by homology. Proteins: Structure, Function, and Bioinformatics, 14(2): 213–223, 1992.
    Google ScholarLocate open access versionFindings
  • Po-Ssu Huang, Scott E Boyken, and David Baker. The coming of age of de novo protein design. Nature, 537 (7620):320, 2016.
    Google ScholarLocate open access versionFindings
  • John Ingraham, Adam Riesselman, Chris Sander, and Debora Marks. Learning protein structure with a differentiable simulator. 2018.
    Google ScholarFindings
  • John Ingraham, Vikas K Garg, Regina Barzilay, and Tommi Jaakkola. Generative models for graph-based protein design. 2019.
    Google ScholarFindings
  • Matthew P Jacobson, George A Kaminski, Richard A Friesner, and Chaya S Rapp. Force field validation using protein side chain prediction. The Journal of Physical Chemistry B, 106(44):11673–11680, 2002.
    Google ScholarLocate open access versionFindings
  • Joel Janin, Shoshanna Wodak, Michael Levitt, and Bernard Maigret. Conformation of amino acid side-chains in proteins. Journal of molecular biology, 125(3):357–386, 1978.
    Google ScholarLocate open access versionFindings
  • Lin Jiang, Eric A Althoff, Fernando R Clemente, Lindsey Doyle, Daniela Rothlisberger, Alexandre Zanghellini, Jasmine L Gallaher, Jamie L Betker, Fujie Tanaka, Carlos F Barbas, et al. De novo computational design of retro-aldol enzymes. science, 319(5868):1387–1391, 2008.
    Google ScholarLocate open access versionFindings
  • William L Jorgensen, David S Maxwell, and Julian Tirado-Rives. Development and testing of the opls all-atom force field on conformational energetics and properties of organic liquids. Journal of the American Chemical Society, 118(45):11225–11236, 1996.
    Google ScholarLocate open access versionFindings
  • Neil P King, Jacob B Bale, William Sheffler, Dan E McNamara, Shane Gonen, Tamir Gonen, Todd O Yeates, and David Baker. Accurate design of co-assembling multi-component protein nanomaterials. Nature, 510 (7503):103, 2014.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
    Findings
  • Brian Kuhlman, Gautam Dantas, Gregory C Ireton, Gabriele Varani, Barry L Stoddard, and David Baker. Design of a novel globular protein fold with atomic-level accuracy. science, 302(5649):1364–1368, 2003.
    Google ScholarLocate open access versionFindings
  • Themis Lazaridis and Martin Karplus. Effective energy functions for protein structure prediction. Current opinion in structural biology, 10(2):139–145, 2000.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Sumit Chopra, Raia Hadsell, M Ranzato, and F Huang. A tutorial on energy-based learning. Predicting structured data, 1(0), 2006.
    Google ScholarLocate open access versionFindings
  • Dennis R Livesay, Dang H Huynh, Sargis Dallakyan, and Donald J Jacobs. Hydrogen bond networks determine emergent mechanical and thermodynamic properties across a protein family. Chemistry Central Journal, 2(1):17, 2008.
    Google ScholarLocate open access versionFindings
  • Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.
    Google ScholarLocate open access versionFindings
  • Alex D MacKerell Jr, Donald Bashford, MLDR Bellott, Roland Leslie Dunbrack Jr, Jeffrey D Evanseck, Martin J Field, Stefan Fischer, Jiali Gao, H Guo, Sookhee Ha, et al. All-atom empirical potential for molecular modeling and dynamics studies of proteins. The journal of physical chemistry B, 102(18): 3586–3616, 1998.
    Google ScholarLocate open access versionFindings
  • Jack B Maguire, Scott E Boyken, David Baker, and Brian Kuhlman. Rapid sampling of hydrogen bond networks for computational protein design. Journal of chemical theory and computation, 14(5):2751–2760, 2018.
    Google ScholarLocate open access versionFindings
  • Elman Mansimov, Omar Mahmood, Seokho Kang, and Kyunghyun Cho. Molecular geometry prediction using a deep generative graph neural network. arXiv preprint arXiv:1904.00314, 2019.
    Findings
  • CA McPhalen and MNG James. Crystal and molecular structure of the serine proteinase inhibitor ci-2 from barley seeds. Biochemistry, 26(1):261–269, 1987.
    Google ScholarLocate open access versionFindings
  • Carl Pabo. Molecular technology: designing proteins and peptides. Nature, 301(5897):200, 1983.
    Google ScholarLocate open access versionFindings
  • Jasmina S Redzic and Bruce E Bowler. Role of hydrogen bond networks and dynamics in positive and negative cooperative stabilization of a protein. Biochemistry, 44(8):2900–2908, 2005.
    Google ScholarLocate open access versionFindings
  • Alexander Rives, Siddharth Goyal, Joshua Meier, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv, 2019. doi: 10.1101/622803. URL https://www.biorxiv.org/content/early/2019/05/29/622803.
    Locate open access versionFindings
  • Andrew Senior, John Jumper, and Demis Hassabis. AlphaFold: Using AI for scientific discovery, 12 2018. URL https://deepmind.com/blog/alphafold/.
    Locate open access versionFindings
  • Maxim V Shapovalov and Roland L Dunbrack Jr. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure, 19(6):844–858, 2011.
    Google ScholarLocate open access versionFindings
  • Manfred J Sippl. Calculation of conformational ensembles from potentials of mena force: an approach to the knowledge-based prediction of local structures in globular proteins. Journal of molecular biology, 213(4): 859–883, 1990.
    Google ScholarLocate open access versionFindings
  • Seiji Tanaka and Harold A Scheraga. Medium-and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. Macromolecules, 9(6):945–950, 1976.
    Google ScholarLocate open access versionFindings
  • P Tuffery, C Etchebest, Serge Hazout, and R Lavery. A new approach to the rapid determination of protein side chain conformations. Journal of Biomolecular structure and dynamics, 8(6):1267–1289, 1991.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008, 2017.
    Google ScholarLocate open access versionFindings
  • Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
    Findings
  • Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. Order matters: Sequence to sequence for sets. arXiv preprint arXiv:1511.06391, 2015.
    Findings
  • G. Wang and Jr. R. L. Dunbrack. Pisces: a protein sequence culling server. Bioinformatics, 19:1589–1591, 2003.
    Google ScholarLocate open access versionFindings
  • Jingxue Wang, Huali Cao, John Z. H. Zhang, and Yifei Qi. Computational protein design with deep learning neural networks. Scientific Reports, 8(1):6349, 2018. ISSN 2045-2322. doi: 10.1038/s41598-018-24760-x. URL https://doi.org/10.1038/s41598-018-24760-x.
    Locate open access versionFindings
  • Jinbo Xu. Distance-based protein folding powered by deep learning. arXiv preprint arXiv:1811.03481, 2018. Kevin K. Yang, Zachary Wu, and Frances H. Arnold. Machine-learning-guided directed evolution for protein engineering. Nature Methods, 16(8):687–694, 2019. ISSN 1548-7105. doi: 10.1038/s41592-019-0496-6. URL https://doi.org/10.1038/s41592-019-0496-6.
    Findings
Author
Jerry Ma
Jerry Ma
Alexander Rives
Alexander Rives
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科