CIMerge: A machine learning approach for merging and genotyping complex indel calls from NGS data

Cancer Research(2018)

引用 1|浏览38
暂无评分
摘要
Complex insertion and deletion (complex indel) is a rare category of genomic structural variations, which is formed by inserting one or multiple DNA fragments into the genomic location where a deletion occurs. A recent study conducts a systematic analysis from over 8,000 pan-cancer cases, which reports hundreds of complex indels in cancer-associated genes, some of which are considered potentially druggable. Several approaches are proposed to detect complex indels from the new-generation sequencing data (both 2 nd and 3 rd generation). However, as different data-mining algorithms vary the preferences on capturing data patterns, different approaches may report the conflicted complex indel calls. Here, we propose a machine learning approach to correct the conflicted calls from different approaches and further estimate the genotype of each call. The proposed approach, implemented as CIMerge, adopts a relevance vector machine framework. For each candidate call, CIMerge first extracts a set of features on the candidate region, which includes the read depth, variant allelic frequency, number of the splitting/unmapped reads, number of the discordant paired-end reads, aligned contig, etc. CIMerge also considers another set of features on data-mining algorithm(s) that reported the candidate call, which includes the parameter settings, etc. Both sets of features are trained by the relevance vector machine framework, which outputs the probability of each candidate of the conflicted call. As a byproduct, it outputs the genotype of the candidate call with highest likelihood. We tested CIMerge on multiple datasets generated by different simulation configurations and compared it to several state-of-the-art approaches. The experiment results demonstrate that CIMerge outperforms the existing approaches. The average success rate of recognition is approaching 90%, while Pindel and Gindel are reported as 62.53% and 65.202%, respectively. The software package CIMerge is freely available for academic uses at https://github.com/xjtu712-lab/CIMerge. Citation Format: Tian Zheng, Yang Li, Yu Geng, Zhongmeng Zhao, Xuanping Zhang, Jiayin Wang. CIMerge: A machine learning approach for merging and genotyping complex indel calls from NGS data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 5294.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要