Ridge regression estimated linear probability model predictions of O -glycosylation in proteins with structural and sequence data

BMC Molecular and Cell Biology(2019)

引用 4|浏览16
暂无评分
摘要
Background To-date, no claim regarding finding a consensus sequon for O -glycosylation has been made. Thus, predicting the likelihood of O -glycosylation with sequence and structural information using classical regression analysis is quite difficult. In particular, if a binary response is used to distinguish between O -glycosylated and non- O -glycosylated sequences, an appropriate set of non- O -glycosylatable sequences is hard to find. Results Three sequences from similar post-translational modifications (PTMs) of proteins occurring at, or very near, the S/T -site are analyzed: N -glycosylation, O -mucin type ( O -GalNAc) glycosylation, and phosphorylation. Results found include: 1) The consensus composite sequon for O -glycosylation is: ~(W–S/T–W) , where “~” denotes the “not” operator. 2) The consensus sequon for phosphorylation is ~(W–S/T/Y/H–W); although W–S/T/Y/H–W is not an absolute inhibitor of phosphorylation. 3) For linear probability model (LPM) estimation, N -glycosylated sequences are good approximations to non- O -glycosylatable sequences; although N – ~P – S/T is not an absolute inhibitor of O -glycosylation. 4) The selective positioning of an amino acid along the sequence, differentiates the PTMs of proteins. 5) Some N -glycosylated sequences are also phosphorylated at the S/T -site in the N – ~P – S/T sequon. 6) ASA values for N -glycosylated sequences are stochastically larger than those for O -GlcNAc glycosylated sequences. 7) Structural attributes (beta turn II, II´, helix, beta bridges, beta hairpin, and the phi angle) are significant LPM predictors of O -GlcNAc glycosylation. The LPM with sequence and structural data as explanatory variables yields a Kolmogorov-Smirnov (KS) statistic of 99%. 8) With only sequence data, the KS statistic erodes to 80%, and 21% of out-of-sample O -GlcNAc glycosylated sequences are mispredicted as not being glycosylated. The 95% confidence interval around this mispredictions rate is 16% to 26%. Conclusions The data indicates the existence of a consensus sequon for O -glycosylation; and underscores the germaneness of structural information for predicting the likelihood of O -glycosylation.
更多
查看译文
关键词
O-glycosylation,N-glycosylation,phosphorylation,consensus sequon,linear,probability model,ridge regression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要