Chrome Extension
WeChat Mini Program
Use on ChatGLM

G-DipC: an Improved Feature Representation Method for Short Sequences to Predict the Type of Cargo in Cell-Penetrating Peptides.

IEEE/ACM transactions on computational biology and bioinformatics(2020)

Cited 15|Views1
No score
Abstract
Cell-penetrating peptides (CPPs) are functional short peptides with high carrying capacity. CPP sequences with targeting functions for the highly efficient delivery of drugs to target cells. In this paper, which is focused on the prediction of the cargo category of CPPs, a biocomputational model is constructed to efficiently distinguish the category of cargo carried by CPPs as macromolecular carriers among the seven known deliverable cargo categories. Based on dipeptide composition (DipC), an improved feature representation method, general dipeptide composition (G-DipC) is proposed for short peptide sequences and can effectively increase the abundance of features represented. Then linear discriminant analysis (LDA) is applied to mine some important low-dimensional features of G-DipC and a predictive model is built with the XGBoost algorithm. Experimental results with five-fold cross validation show that G-DipC improves accuracy by 25 and 5 percent compared with amino acid composition (AAC) and DipC, respectively. G-DipC is even found to be better than tripeptide composition (TipC). Thus, the proposed model provides a novel resource for the study of cell-penetrating peptides, and the improved dipeptide composition G-DipC can be widely adapted to determine the feature representation of other biological sequences.
More
Translated text
Key words
Peptides,Biological system modeling,Correlation,Predictive models,Proteins,Feature extraction,Cell-penetrating peptides,type of cargo,general dipeptide composition,linear discriminant analysis,XGBoost algorithm
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined