THEA: A novel approach to gene identification in phage genomes

bioRxiv(2018)

引用 3|浏览19
暂无评分
摘要
Motivation: Currently there are no tools specifically designed for annotating genes in phages. Several tools are available that have been adapted to run on phage genomes, but due to their underlying design they are unable to capture the full complexity of phage genomes. Phages have adapted their genomes to be extremely compact, having adjacent genes that overlap, and genes completely inside of other longer genes. This non-delineated genome structure makes it difficult for gene prediction using the currently available gene annotators. Here we present THEA (The Algorithm), a novel method for gene calling specifically designed for phage genomes. While the compact nature of genes in phages is a problem for current gene annotators, we exploit this property by treating a phage genome as a network of paths: where open reading frames are favorable, and overlaps and gaps are less favorable, but still possible. We represent this network of connections as a weighted graph, and use graph theory to find the optimal path. Results: We compare THEA to other gene callers by annotating a set of 2,133 complete phage genomes from GenBank, using THEA and the three most popular gene callers. We found that the four programs agree on 82% of the total predicted genes, with THEA predicting significantly more genes than the other three. We searched for these extra genes in both GenBank9s non-redundant protein database and sequence read archive, and found that they are present at levels that suggest that these are functional protein coding genes. Availability and Implementation: The source code and all files can be found at: https://github.com/deprekate/THEA
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要