Chrome Extension
WeChat Mini Program
Use on ChatGLM

Generation of ENSEMBL-based Proteogenomics Databases Boosts the Identification of Non-Canonical Peptides

Bioinformatics(2021)

Cited 10|Views21
No score
Abstract
Summary We have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs, and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD, and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling, notably optimized target/decoy generation by the algorithm DecoyPyrat . Finally, we perform a reanalysis of four public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to more than 10% of the total number of peptides identified (43,501 out of 402,512). Availability The software is freely available. pypgatk : ( https://github.com/bigbio/py-pgatk/ ), and pgdb : ( https://github.com/nf-core/pgdb ) Contact Yasset Perez-Riverol ( yperez@ebi.ac.uk ), Rui Branca ( rui.mamede-branca@ki.se ) Supplementary information Supplementary data are available online.
More
Translated text
Key words
metagenomics assembly,Protein Synthesis,sequence alignment
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined