339. GENIS: A Bioinformatics Tool for Reliable and Automated Genome Insertion Site Analysis

MOLECULAR THERAPY(2015)

引用 0|浏览70
暂无评分
摘要
Over the last two decades, gene therapy has shown rapid advancements as a promising approach to treat genetic diseases by introducing corrected genes into patient cells. Viruses are the most common carriers in the vector-mediated gene therapy. However, integration of viral vectors at undesirable genomic locations can lead to deleterious effects, e.g. insertional mutagenesis. Therefore, an efficient, stable and safe vector system is the major prerequisite for a successful gene therapy. Long term monitoring of the distribution pattern of vector integration sites (IS) is the most feasible strategy to address vector safety and stability concerns.Recent advancements in next generation sequencing technologies have dramatically increased the possibility to generate substantial amount of vector-genome sequencing data for comprehensive IS analysis. An efficient downstream analysis of this data requires automated and fast computational methods. Here, we present Genome Insertion Site (GENIS) pipeline, a suite for time-efficient and reliable analysis of vector-genome junctions. GENIS has been designed to analyze the sequencing data generated from traditional linear amplification mediated PCR (LAM-PCR) based methods and also from new targeted DNA single and paired end sequencing technologies (e.g., Agilent SureSelect). Our suite consists of six basic modules including barcode sorting, quality filtering and adapter trimming, mapping of sequence reads to the reference genome, extraction of soft-clip reads and clustering of IS for subsequent annotation.GENIS is implemented on Linux platform with minimum external software dependencies. Users can adjust the required parameters in the provided configuration file. It takes about 30 minutes for complete processing, starting from raw reads till annotation, of 10 million paired end reads generated by targeted sequencing. In case of LAM-PCR data, 30 million reads are sorted in about 30 minutes (50 different PCR) and time required for rest of processing to obtain annotated IS is also approximately 30 minutes for 15 million reads. Three final files present the conclusion of the analysis process and contain: 1) the information about read ID, chromosome position (genomic IS), vector position (vector IS), sequence, genomic and vector orientation and sequence span; 2) all the clustered IS with their respective sequence count and 3) the annotated IS with respect to nearby genomic features, including gene identifier and gene name, transcription start site, coding region start and end sites etc. Our tool is highly appropriate for in-depth quantitative analysis of biosafety and transduction efficiency of viral vectors.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要