A new gene finding tool GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes

biorxiv(2024)

引用 0|浏览17
暂无评分
摘要
Large-scale genomic initiatives, such as the Earth BioGenome Project, require efficient methods for eukaryotic genome annotation. Here we present an automatic gene finder, GeneMark-ETP, integrating genomic-, transcriptomic- and protein-derived evidence that has been developed with a focus on large plant and animal genomes. GeneMark-ETP first identifies genomic loci where extrinsic data is sufficient for making gene predictions with ‘high confidence’. The genes situated in the genomic space between the high confidence genes are predicted in the next stage. The set of high confidence genes serves as an initial training set for the statistical model. Further on, the model parameters are iteratively updated in the rounds of gene prediction and parameter re-estimation. Upon reaching convergence, GeneMark-ETP makes the final predictions and delivers the whole complement of predicted genes. GeneMark-ETP outperformed gene finders using a single type of extrinsic evidence. Comparisons with gene finders utilizing both transcript- and protein-derived extrinsic evidence, MAKER2, and TSEBRA, demonstrated that GeneMark-ETP delivered state-of-the-art gene prediction accuracy with the margin of outperforming existing approaches increasing in its applications to larger and more complex eukaryotic genomes. ### Competing Interest Statement GeneMark-ETP and its part GeneMark.hmm, are distributed under the Creative Commons license. Licensing GeneMark.hmm by commercial companies may create a conflict of interest for AL and MB. TB declares having no competing interests.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要