A Better Scoring Model For De Novo Peptide Sequencing: The Symmetric Difference Between Explained And Measured Masses

Algorithms for Molecular Biology(2016)

引用 3|浏览10
暂无评分
摘要
Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the ideal de novo peptide sequencing problem: Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. The (real, noisy) de novo peptide sequencing problem therefore has a noisy input: a few of the prefix and suffix masses of the peptide are missing and a few others are given in addition. For this setting we ask for an amino acid string that explains the given masses as accurately as possible. Past approaches interpreted accuracy by searching for a string that explains as many masses as possible. We feel, however, that it is not only bad to not explain a mass that appears, but also to explain a mass that does not appear. That is, we propose to minimize the symmetric difference between the set of given masses and the set of masses that the string explains. For this new optimization problem, we propose an efficient algorithm that computes both the best and the k best solutions. Experiments on measurements of 342 synthesized peptides show that our approach leads to better results compared to finding a string that explains as many given masses as possible.
更多
查看译文
关键词
Directed Acyclic Graph, Dynamic Programming Algorithm, Longe Path, Fragment Masse, Sequencing Problem
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要