谷歌浏览器插件
订阅小程序
在清言上使用

Phylogenetic Approaches for Detecting Fragmentation in Genome and Transcriptome Annotations

Doctoral thesis, UCL (University College London)(2020)

引用 0|浏览2
暂无评分
摘要
The landscape of biological research and innovation has been transformed with the invention of genome sequencing methods and corresponding assembly and annotation algorithms. Yet many assemblies and annotations remain fragmented limiting applications which require more complete and reliable datasets. The goal of this thesis was to establish methods to detect fragmentation in genome and transcriptome annotation by exploiting available data from related species in a phylogenetic framework. Prior to applying core methods to detect fragmentation, it is important to establish informative sequences from related species, i.e. putative homologs. This typically requires all-against-all protein-protein sequence comparison within and across species in the dataset. To speed up this process, we developed an approach which attempts to incorporate transitive property of homology and considers putative homology on putative protein subsequences. Putative homologs can then be used as input for our phylogenetic heuristics to detect fragments of the same gene model in the genome assembly of interest. One heuristic collapses internal tree branches with low SH-like branch support, the other exploits a likelihood ratio value. The heuristics found 1,221 pairs of distinct gene models in the challenging putative bread wheat genome which we believe are actually fragments of the same gene model. We also employed the heuristics on the putative genome of wild olive and identified 102 pairs of distinct gene models, potentially fragments of the same model. Importantly, we provide guidelines on assessing predictions based on the data at hand. Finally, we started exploring behaviour of the heuristics on the transcript models constructed on the cassava transcriptome assembly. Due to time constraints, the outcomes of the study are limited but hopefully provide sound guidelines for further work. The methods are not restricted to the plant kingdom and can already be used on any species in their current state.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要