Algorithms for aligning and clustering genomic sequences that contain duplications

Algorithms for aligning and clustering genomic sequences that contain duplications(2007)

引用 24|浏览10
暂无评分
摘要
Genomes of advanced organisms contain numerous repeated sequences, including gene clusters, tandem repeats, interspersed repeats, and segmental duplications. Among these, gene clusters are the class most frequently of functional importance. Algorithmic processing of regions containing these clusters remains challenging in practice, and its lack of clean solutions has been a big obstacle in sequence analysis in bioinformatics. This thesis includes new methodologies for solving two sets of problems in processing the sequences of gene-cluster regions, particularly methods to properly align gene-cluster regions of multiple species. Similar sequences sharing the same evolutionary origin are homologous . Homologous sequences that differ by speciation are orthologous . One set of problems deals with aligning all and only orthologous sequences in a gene-cluster region, between two or more species. A two-way orthologous-sequence identification tool is developed to produce orthologous pairwise alignments. The results are evaluated based on the phylogenetic inference of gene sequences. High specificity is achieved without much loss of sensitivity. Two approaches are designed to create orthologous multi-species alignments. One uses a chosen species to guide the alignment process, and it has been successfully applied genome-wide. The other solves a more difficult formulation of the problem, where all species are treated equally. Its computational difficulty is discussed, and some initial experiments are reported. Another set of methods deals with the construction of all homologous groups within a single genome. Each homologous group is expected to contain precisely the genomic intervals that are homologous to each other. A mixture of algorithmic and heuristic procedures is designed to maintain a balance between the completeness and purity of each group. We verify the accuracy and efficiency of these methodologies.
更多
查看译文
关键词
gene sequence,orthologous multi-species alignment,orthologous sequence,gene-cluster region,align gene-cluster region,multiple species,clustering genomic sequence,orthologous pairwise alignment,chosen species,algorithmic processing,gene cluster,bioinformatics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要