Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks

Ananthan Nambiar
Ananthan Nambiar
Maeve Elizabeth Heflin
Maeve Elizabeth Heflin
Simon Liu
Simon Liu
Sergei Maslov
Sergei Maslov
Anna Ritz
Anna Ritz

BCB, pp. 1-16, 2020.

Cited by: 0|Bibtex|Views103|DOI:https://doi.org/10.1145/3388440.3412467
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
We show PRoBERTa’s performance when the model is fine-tuned for the Protein Family Classification and Protein-Protein Interaction Prediction tasks

Abstract:

The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer n...More

Code:

Data:

0
Introduction
  • The advent of new protein sequencing technologies has accelerated the rate of protein discovery [1].
  • Often referred to as word embeddings, these vector representations are typically “pre-trained" on an auxiliary task for which the authors have a large amount of training data
  • The goal of this pre-training is to learn generically useful representations that encode deep semantic and syntactic information [12].
  • These “smart" representations can be used to train systems for NLP tasks for which the authors have only a moderate amount of training data
Highlights
  • The advent of new protein sequencing technologies has accelerated the rate of protein discovery [1]
  • We show PRoBERTa’s performance when the model is fine-tuned for the Protein Family Classification and Protein-Protein Interaction (PPI) Prediction tasks
  • We propose a Transformer based neural network architecture, called PRoBERTa, for protein characterization tasks
  • We used embeddings from PRoBERTa for a fundamentally different problem, PPI Prediction, using two different datasets generated from the HIPPIE database and found that with sufficient data, it substantially outperforms the current state-of-the-art method in the conservative scenario and still performs better than the other methods in the aggressive scenario
  • This, combined with the larger decrease in Normalized Mutual Information (NMI) with protein families in the aggressive scenario (Figure 4), suggests that the model in the conservative scenario performs something closer to a protein classification task to identify which proteins are present in the HIPPIE dataset and are more likely to correspond to positive interaction examples
  • PRoBERTa’s success in these two different protein prediction tasks alludes to the generality of the embeddings and their potential to be used in other tasks such as predicting protein binding affinity, protein interaction types and identifying proteins associated with particular diseases
Methods
  • The authors treat proteins as a “language” and draw ideas from the state-of-the-art techniques in natural language processing to obtain a vector representation for proteins.
  • For a sequence of amino acids to be treated as a sentence, the alphabet of the language is defined to be the set of symbols.
  • Before amino acid sequences can be interpreted as a language, the authors must first define what a word is.
  • There has been recent interest [49] in statistically determining segments of amino acids to be used as inputs for downstream machine learning algorithms using an NLP method called byte pair encoding (BPE) [50].
Results
  • The authors first describe the sequence features learned from the pre-trained model, before the fine-tuning stage.
  • The authors show PRoBERTa’s performance when the model is fine-tuned for the Protein Family Classification and PPI Prediction tasks.
  • 3.1 Protein Embeddings from the Pre-Trained Model.
  • The authors pre-trained the PRoBERTa model as described in Section 2.3 on 4 NVIDIA V100 GPUs in 18 hours.
  • The authors first asked whether the pre-trained model contained any biological meaning in the amino acid sequences.
  • The pre-trained model is already able to distinguish between these protein families
Conclusion
  • The authors propose a Transformer based neural network architecture, called PRoBERTa, for protein characterization tasks.
  • The authors used embeddings from PRoBERTa for a fundamentally different problem, PPI Prediction, using two different datasets generated from the HIPPIE database and found that with sufficient data, it substantially outperforms the current state-of-the-art method in the conservative scenario and still performs better than the other methods in the aggressive scenario.
  • This, combined with the larger decrease in NMI with protein families in the aggressive scenario (Figure 4), suggests that the model in the conservative scenario performs something closer to a protein classification task to identify which proteins are present in the HIPPIE dataset and are more likely to correspond to positive interaction examples.
  • In light of the COVID-19 pandemic, the authors are currently working on adapting PRoBERTa for vaccine design
Summary
  • Introduction:

    The advent of new protein sequencing technologies has accelerated the rate of protein discovery [1].
  • Often referred to as word embeddings, these vector representations are typically “pre-trained" on an auxiliary task for which the authors have a large amount of training data
  • The goal of this pre-training is to learn generically useful representations that encode deep semantic and syntactic information [12].
  • These “smart" representations can be used to train systems for NLP tasks for which the authors have only a moderate amount of training data
  • Objectives:

    In the pre-training stage, the objective is to train the model to learn task-agnostic deep representations that capture high-level structure of amino acid sequences.
  • Methods:

    The authors treat proteins as a “language” and draw ideas from the state-of-the-art techniques in natural language processing to obtain a vector representation for proteins.
  • For a sequence of amino acids to be treated as a sentence, the alphabet of the language is defined to be the set of symbols.
  • Before amino acid sequences can be interpreted as a language, the authors must first define what a word is.
  • There has been recent interest [49] in statistically determining segments of amino acids to be used as inputs for downstream machine learning algorithms using an NLP method called byte pair encoding (BPE) [50].
  • Results:

    The authors first describe the sequence features learned from the pre-trained model, before the fine-tuning stage.
  • The authors show PRoBERTa’s performance when the model is fine-tuned for the Protein Family Classification and PPI Prediction tasks.
  • 3.1 Protein Embeddings from the Pre-Trained Model.
  • The authors pre-trained the PRoBERTa model as described in Section 2.3 on 4 NVIDIA V100 GPUs in 18 hours.
  • The authors first asked whether the pre-trained model contained any biological meaning in the amino acid sequences.
  • The pre-trained model is already able to distinguish between these protein families
  • Conclusion:

    The authors propose a Transformer based neural network architecture, called PRoBERTa, for protein characterization tasks.
  • The authors used embeddings from PRoBERTa for a fundamentally different problem, PPI Prediction, using two different datasets generated from the HIPPIE database and found that with sufficient data, it substantially outperforms the current state-of-the-art method in the conservative scenario and still performs better than the other methods in the aggressive scenario.
  • This, combined with the larger decrease in NMI with protein families in the aggressive scenario (Figure 4), suggests that the model in the conservative scenario performs something closer to a protein classification task to identify which proteins are present in the HIPPIE dataset and are more likely to correspond to positive interaction examples.
  • In light of the COVID-19 pandemic, the authors are currently working on adapting PRoBERTa for vaccine design
Tables
  • Table1: Comparison of binary family classification
  • Table2: Comparison of multi-class family classification
  • Table3: PPI prediction results using 20% of training data (top) and using 100% of training data (bottom)
Download tables as Excel
Funding
  • This work has been supported by the National Science Foundation (awards #1750981 and #1725729)
  • This work has also been partially supported by the Google Cloud Platform research credits program (to AR, MH, and AN)
Study subjects and analysis
proteins: 50
Clustering the vectors fine-tuned on the protein family classification task increases the NMI even more than the pre-trained model (Figure 4), suggesting that the fine-tuned embeddings have more specific information related to protein classification. In the binary classification task, we trained a separate logistic regression classifier for each protein family with greater than 50 proteins and measured the weighted mean accuracy as 0.98 with the lowest scoring family being made up of 57 proteins and having an accuracy of 0.77. In performing this, we randomly withheld 30% of the proteins from each family to be used as the test set

UniProt proteins with only one associated family: 313214
In the multi-class family classification task, we used fine-tuning to add an output layer that maps to protein family labels. This was done using the dataset of 313,214 UniProt proteins with only one associated family. These proteins were split into train/validation/test sets (0.8/0.1/0.1) and provided us with a classifier that had an accuracy of 0.92 on the test set

proteins: 250504
The PPI models appears to be more robust (they have smaller slopes) than the Protein Family model. However, it should be noted that the complete train set for the Protein Family model contained 250,504 proteins, while the PPI model had 480,455 interactions in the conservative scenario and 429,239 interactions in the aggressive scenario. This difference in robustness could be due to the absolute difference in number of training data points

Reference
  • Laura Restrepo-Pérez, Chirlmin Joo, and Cees Dekker. Paving the way to single-molecule protein sequencing. Nature nanotechnology, 13(9):786–796, 2018.
    Google ScholarLocate open access versionFindings
  • The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Research, 47(D1):D506– D515, 11 2018.
    Google ScholarLocate open access versionFindings
  • Minsik Oh, Seokjun Seo, Sun Kim, and Youngjune Park. DeepFam: deep learning based alignment-free method for protein family modeling and prediction. Bioinformatics, 34(13):i254–i262, 06 2018.
    Google ScholarLocate open access versionFindings
  • Muhao Chen, Chelsea J T Ju, Guangyu Zhou, Xuelu Chen, Tianran Zhang, Kai-Wei Chang, Carlo Zaniolo, and Wei Wang. Multifaceted protein-protein interaction prediction based on Siamese residual RCNN. Bioinformatics, 35(14):i305–i314, 07 2019.
    Google ScholarLocate open access versionFindings
  • Temple F Smith, Michael S Waterman, et al. Identification of common molecular subsequences. Journal of molecular biology, 147(1):195–197, 1981.
    Google ScholarLocate open access versionFindings
  • Christof Angermueller, Tanel Pärnamaa, Leopold Parts, and Oliver Stegle. Deep learning for computational biology. Molecular systems biology, 12(7), 2016.
    Google ScholarLocate open access versionFindings
  • Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Michael Zietz, Michael M. Hoffman, Wei Xie, Gail L. Rosen, Benjamin J. Lengerich, Johnny Israeli, Jack Lanchantin, Stephen Woloszynek, Anne E. Carpenter, Avanti Shrikumar, Jinbo Xu, Evan M. Cofer, Christopher A. Lavender, Srinivas C. Turaga, Amr M. Alexandari, Zhiyong Lu, David J. Harris, Dave DeCaprio, Yanjun Qi, Anshul Kundaje, Yifan Peng, Laura K. Wiley, Marwin H. S. Segler, Simina M. Boca, S. Joshua Swamidass, Austin Huang, Anthony Gitter, and Casey S. Greene. Opportunities and obstacles for deep learning in biology and medicine. Journal of the Royal Society, Interface, 15(141):20170387, Apr 2018. 29618526[pmid].
    Google ScholarLocate open access versionFindings
  • Jianzhu Ma, Michael Ku Yu, Samson Fong, Keiichiro Ono, Eric Sage, Barry Demchak, Roded Sharan, and Trey Ideker. Using deep learning to model the hierarchical structure and function of a cell. Nature Methods, 15(4):290–298, 2018.
    Google ScholarLocate open access versionFindings
  • Ryan Poplin, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas Colthurst, Alexander Ku, Dan Newburger, Jojo Dijamco, Nam Nguyen, Pegah T. Afshar, Sam S. Gross, Lizzie Dorfman, Cory Y. McLean, and Mark A. DePristo. A universal snp and small-indel variant caller using deep neural networks. Nature Biotechnology, 36(10):983–987, 2018.
    Google ScholarLocate open access versionFindings
  • Alex Zhavoronkov, Yan A. Ivanenkov, Alex Aliper, Mark S. Veselov, Vladimir A. Aladinskiy, Anastasiya V. Aladinskaya, Victor A. Terentiev, Daniil A. Polykovskiy, Maksim D. Kuznetsov, Arip Asadulaev, Yury Volkov, Artem Zholus, Rim R. Shayakhmetov, Alexander Zhebrak, Lidiya I. Minaeva, Bogdan A. Zagribelnyy, Lennart H. Lee, Richard Soll, David Madge, Li Xing, Tao Guo, and Alán Aspuru-Guzik. Deep learning enables rapid identification of potent ddr1 kinase inhibitors. Nature Biotechnology, 37(9):1038–1040, 2019.
    Google ScholarLocate open access versionFindings
  • Christopher D Manning, Christopher D Manning, and Hinrich Schütze. Foundations of statistical natural language processing. MIT press, 1999.
    Google ScholarFindings
  • Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent trends in deep learning based natural language processing. CoRR, abs/1708.02709, 2017.
    Findings
  • Mark A. Bedau, Nicholas Gigliotti, Tobias Janssen, Alec Kosik, Ananthan Nambiar, and Norman Packard. Open-ended technological innovation. Artificial Life, 25(1):33–49, 2019. PMID: 30933632.
    Locate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 3111–3119. Curran Associates, Inc., 2013.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.
    Google ScholarFindings
  • Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. CoRR, abs/1802.05365, 2018.
    Findings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc.
    Google ScholarLocate open access versionFindings
  • Ehsaneddin Asgari and Mohammad R. K. Mofrad. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLOS ONE, 10(11):1–15, 11 2015.
    Google ScholarLocate open access versionFindings
  • Michael Heinzinger, Ahmed Elnaggar, Yu Wang, Christian Dallago, Dmitrii Nachaev, Florian Matthes, and Burkhard Rost. Modeling the language of life – deep learning protein sequences. bioRxiv, 2019.
    Google ScholarFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Ro{bert}a: A robustly optimized {bert} pretraining approach, 2020.
    Google ScholarFindings
  • Natalie L. Dawson, Ian Sillitoe, Jonathan G. Lees, Su Datt Lam, and Christine A." Orengo. CATH-Gene3D: Generation of the Resource and Its Use in Obtaining Structural and Functional Annotations for Protein Sequences, pages 79–110. Springer New York, New York, NY, 2017.
    Google ScholarFindings
  • Julian Gough, Kevin Karplus, Richard Hughey, and Cyrus Chothia. Assignment of homology to genome sequences using a library of hidden markov models that represent all proteins of known structure11edited by g. von heijne. Journal of Molecular Biology, 313(4):903 – 919, 2001.
    Google ScholarLocate open access versionFindings
  • Huaiyu Mi, Sagar Poudel, Anushya Muruganujan, John T. Casagrande, and Paul D. Thomas. PANTHER version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Research, 44(D1):D336–D342, 11 2015.
    Google ScholarLocate open access versionFindings
  • Marco Punta, Penny C. Coggill, Ruth Y. Eberhardt, Jaina Mistry, John Tate, Chris Boursnell, Ningze Pang, Kristoffer Forslund, Goran Ceric, Jody Clements, Andreas Heger, Liisa Holm, Erik L. L. Sonnhammer, Sean R. Eddy, Alex Bateman, and Robert D. Finn. The Pfam protein families database. Nucleic Acids Research, 40(D1):D290–D301, 11 2011.
    Google ScholarLocate open access versionFindings
  • Sayoni Das and Christine A. Orengo. Protein function annotation using protein domain family resources. Methods, 93:24 – 34, 2016. Computational protein function predictions.
    Google ScholarLocate open access versionFindings
  • Maxwell L. Bileschi, David Belanger, Drew Bryant, Theo Sanderson, Brandon Carter, D. Sculley, Mark A. DePristo, and Lucy J. Colwell. Using deep learning to annotate the protein universe. bioRxiv, 2019.
    Google ScholarFindings
  • Nils Strodthoff, Patrick Wagner, Markus Wenzel, and Wojciech Samek. UDSMProt: universal deep sequence models for protein classification. Bioinformatics, 01 2020. btaa003.
    Google ScholarLocate open access versionFindings
  • Javier De Las Rivas and Celia Fontanillo. Protein-protein interactions essentials: Key concepts to building and analyzing interactome networks. PLOS Computational Biology, 6(6):1–8, 06 2010.
    Google ScholarLocate open access versionFindings
  • Tuba Sevimoglu and Kazim Yalcin Arga. The role of protein interaction networks in systems biomedicine. Computational and Structural Biotechnology Journal, 11(18):22 – 27, 2014.
    Google ScholarLocate open access versionFindings
  • Uros Kuzmanov and Andrew Emili. Protein-protein interaction networks: probing disease mechanisms using model systems. Genome Medicine, 5(4), Apr 2013.
    Google ScholarLocate open access versionFindings
  • Ioanna Petta, Sam Lievens, Claude Libert, Jan Tavernier, and Karolien De Bosscher. Modulation of protein–protein interactions for the development of novel therapeutics. Molecular Therapy, 24(4):707–718, Apr 2016.
    Google ScholarLocate open access versionFindings
  • Diego Alonso-López, Francisco J Campos-Laborie, Miguel A Gutiérrez, Luke Lambourne, Michael A Calderwood, Marc Vidal, and Javier De Las Rivas. APID database: redefining protein-protein interaction experimental evidences and binary interactomes. Database, 2019, 01 2019.
    Google ScholarLocate open access versionFindings
  • Alberto Calderone, Luisa Castagnoli, and Gianni Cesareni. mentha: a resource for browsing integrated proteininteraction networks. Nature Methods, 10(8):690–691, 2013.
    Google ScholarLocate open access versionFindings
  • Henning Hermjakob, Luisa Montecchi-Palazzi, Chris Lewington, Sugath Mudali, Samuel Kerrien, Sandra Orchard, Martin Vingron, Bernd Roechert, Peter Roepstorff, Alfonso Valencia, Hanah Margalit, John Armstrong, Amos Bairoch, Gianni Cesareni, David Sherman, and Rolf Apweiler. Intact: an open source molecular interaction database. Nucleic acids research, 32(Database issue):D452–D455, Jan 2004.
    Google ScholarLocate open access versionFindings
  • Luana Licata, Leonardo Briganti, Daniele Peluso, Livia Perfetto, Marta Iannuccelli, Eugenia Galeota, Francesca Sacco, Anita Palma, Aurelio Pio Nardozza, Elena Santonico, Luisa Castagnoli, and Gianni Cesareni. Mint, the molecular interaction database: 2012 update. Nucleic acids research, 40(Database issue):D857–D861, Jan 2012.
    Google ScholarLocate open access versionFindings
  • Ulrich Stelzl, Uwe Worm, Maciej Lalowski, Christian Haenig, Felix H. Brembeck, Heike Goehler, Martin Stroedicke, Martina Zenkner, Anke Schoenherr, Susanne Koeppen, Jan Timm, Sascha Mintzlaff, Claudia Abraham, Nicole Bock, Silvia Kietzmann, Astrid Goedde, Engin Toksöz, Anja Droege, Sylvia Krobitsch, Bernhard Korn, Walter Birchmeier, Hans Lehrach, and Erich E. Wanker. A human protein-protein interaction network: A resource for annotating the proteome. Cell, 122(6):957–968, Sep 2005.
    Google ScholarLocate open access versionFindings
  • Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, Lars J Jensen, and Christian von Mering. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genomewide experimental datasets. Nucleic Acids Research, 47(D1):D607–D613, 11 2018.
    Google ScholarLocate open access versionFindings
  • Yanzhi Guo, Lezheng Yu, Zhining Wen, and Menglong Li. Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic Acids Research, 36(9):3025– 3030, 04 2008.
    Google ScholarLocate open access versionFindings
  • Xue-Wen Chen and Mei Liu. Prediction of protein-protein interactions using random decision forest framework. Bioinformatics, 21(24):4394–4400, 10 2005.
    Google ScholarLocate open access versionFindings
  • Shao-Wu Zhang, Li-Yang Hao, and Ting-He Zhang. Prediction of protein-protein interaction with pairwise kernel support vector machine. International Journal of Molecular Sciences, 15(2):3220–3233, Feb 2014.
    Google ScholarLocate open access versionFindings
  • Yi Guo and Xiang Chen. A deep learning framework for improving protein interaction prediction using sequence properties. bioRxiv, 2019.
    Google ScholarFindings
  • Tanlin Sun, Bo Zhou, Luhua Lai, and Jianfeng Pei. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics, 18(1):277, 2017.
    Google ScholarLocate open access versionFindings
  • Alexander Rives, Siddharth Goyal, Joshua Meier, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv, 2019.
    Google ScholarFindings
  • Yang You, Jing Li, Jonathan Hseu, Xiaodan Song, James Demmel, and Cho-Jui Hsieh. Reducing BERT pretraining time from 3 days to 76 minutes. CoRR, abs/1904.00962, 2019.
    Findings
  • Nomenclature and symbolism for amino acids and peptides. European Journal of Biochemistry, 138(1):9–37, 1984.
    Google ScholarLocate open access versionFindings
  • Somaye Hashemifar, Behnam Neyshabur, Aly A Khan, and Jinbo Xu. Predicting protein-protein interactions through sequence-based deep learning. Bioinformatics, 34(17):i802–i810, 09 2018.
    Google ScholarLocate open access versionFindings
  • Ehsaneddin Asgari, Alice C. McHardy, and Mohammad R. K. Mofrad. Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (dimotif) and sequence embedding (protvecx). Scientific Reports, 9(1):3577, 2019.
    Google ScholarLocate open access versionFindings
  • Philip Gage. A new algorithm for data compression. C Users J., 12(2):23–38, February 1994.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725, Berlin, Germany, August 2016. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Taku Kudo and John Richardson. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing, November 2018.
    Google ScholarFindings
  • Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
    Findings
  • Dan Hendrycks and Kevin Gimpel. Bridging nonlinearities and stochastic regularizers with gaussian error linear units. arXiv preprint arXiv:1606.08415, 2016.
    Findings
  • Gregorio Alanis-Lobato, Miguel A. Andrade-Navarro, and Martin H. Schaefer. HIPPIE v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks. Nucleic Acids Research, 45(D1):D408–D414, 10 2016.
    Google ScholarLocate open access versionFindings
  • Tobias Hamp and Burkhard Rost. Evolutionary profiles improve protein-protein interaction prediction from sequence. Bioinformatics, 31(12):1945–1950, 02 2015.
    Google ScholarLocate open access versionFindings
  • Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of NAACL-HLT 2019: Demonstrations, 2019.
    Google ScholarLocate open access versionFindings
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
    Google ScholarLocate open access versionFindings
  • Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is “‘nearest neighbor’” meaningful? In Proceedings of the 7th International Conference on Database Theory, ICDT ’99, page 217–235, Berlin, Heidelberg, 1999. Springer-Verlag.
    Google ScholarLocate open access versionFindings
  • Ananthan Nambiar, Mark Hopkins, and Anna Ritz. Computing the language of life: Nlp approaches to feature extraction for protein classification. In ISMB/ECCB 2019: Poster Session, 2019.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments