谷歌浏览器插件
订阅小程序
在清言上使用

535. Increasing Accuracy and Precision of Vector Integration Site Identification of Sequencing Reads with a New Bioinformatics Framework

Molecular therapy(2015)

引用 0|浏览12
暂无评分
摘要
In hematopoietic stem cell (HSC) gene therapy (GT) applications patients are transplanted with autologuos HSCs that have been ex-vivo genetically modified with integration competent vectors to express a therapeutic transgene. Specific PCR techniques coupled to next generation sequencing and bioinformatics analysis allow the high throughput retrieval, sequencing and mapping of proviral/genomic DNA junctions present in the blood and bone marrow derived cell populations sampled at different time points after therapy. The increase in sequences available for IS mapping is accompanied by an increase in false positives derived by sequencing errors or sequencing read parsing and mapping on the reference genome. In particular, by analyzing IS datasets form vector marked human and mouse tumor cells, clones with defined integration sites and GT patients, we observed that when multiple sequences arising from the same IS are aligned on the reference genome >10% mapped near (+/- 4 bases) the true insertion site. Without correction, these misaligned sequences not only result in an overestimation of the overall number of IS but in some cases also in the generation of false common insertion sites, worrisome hallmarks of insertional mutagenesis. To mitigate this issue we and others, based on empirical observations, merge sequencing reads mapping within +/- 3 bp into a single IS. Although this adjustment reduces the impact of the “wobbling” around the true ISs, a dedicated method and model is still missing. To further increase the accuracy of genomic positioning of sequencing reads we developed a new bioinformatics framework as post-processing plugin for pipelines that correctly partitions sequencing reads in a given genomic position by considering the relative abundance and distribution of each sequence cluster using local modes and Gaussian scores through an adaptive approach that varies the parameters of the Gaussian curve and proposes different solutions. To chose the best solution, the algorithm first evaluates each solution by exploiting 100 simulations of the input reads and then selects the resulting best solution using the Kolmogorov-Smirnov test. The simulation step is designed to test the mappability of the IS genomic interval and to quantify the impact of the observed nucleotide variations of the reads with respect to the reference genome (PCR artifacts or real genomic differences) that may lead to different mapping results that justify a larger span of the mapped reads surrounding the putative IS. The algorithm returns the list of IS and relative number of reads with the p-value of the best solution. We performed 3 ad-hoc in vitro experiments on a cell clone with 6 known IS in which we measured the precision of IS placement obtaining an average of 100% with our new method whereas <30% using our previous method based on a rigid sliding window approach of 4 bp. We applied our new approach to our clinical trial datasets obtaining improvements in IS genomic placement and overestimation with a reduction of potential false IS of 3% without changing the biological results.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要