Detecting Complex Indels With Wide Length-Spectrum From The Third Generation Sequencing Data

2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)(2017)

引用 4|浏览2
暂无评分
摘要
Structural variations are a complex collection of mutations, many of which are reported to associated to complex traits. Recent research reports a rare case of structural variants, complex indels, which may contribute to carcinogenesis. A complex indel often presents multiple inserted nucleotides in a deleted region. Due to the limitations on both data and algorithm, existing approaches could only detect complex indels with the length shorter than 80bps; however, the longer ones are considered to imply stronger impact. In this paper, we propose a novel algorithm, SVseq3, which handles the PacBio data and identifies the long complex indels. The algorithm captures the BLASR alignment results and locates the suspicious areas of complex indels by clustering. An improved similarity hash-based framework is then constructed. For each suspicious area, a continuing-seed strategy is adopted to split the inserted fragments and obtain the original locations. The mapped segments, which consist of a series of seeds, are used to further squeeze the intermediate breakpoints and identify the forms of the complex indels. SVseq3 is able to detect long complex indels and the complex indels with multiple sources of inserted fragments. We test SVseq3 on multiple datasets with different simulation configurations and compare it to the existing methods. The experiment results demonstrate that SVseq3 outperforms the existing approaches. The sensitivity and positive-predictive rates are able to reach around 70% and 85% in some common simulation settings, respectively.
更多
查看译文
关键词
Structural variation, complex indel, detection method, hash-tabel based algorithm, the third generation sequencing data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要