Development of Tbspg Pipelines for Refining Unique Mapping and Repetitive Sequence Detection Using the Two Halves of Each Illumina Sequence Read

Plant molecular biology reporter（2015）

引用 6|浏览9

暂无评分

摘要

We developed six pipelines (TBSPG) for mapping Illumina sequence reads to reference genomes, refining unique mapping, and computing the mapped read number and coverage. These pipelines provide the options of conducting multi-mapping or unique mapping, inputting with paired-end read files or a single-end read file, removing or not removing nucleus-organelle shared sequences, and mapping with the full-length reads or with the two halves of each read to refine the detection of unique and non-unique sequences. These TBSPG pipelines were based on (and named after) publicly available tools: Trimmomatic, the Burrows–Wheeler Aligner (BWA), SAMtools, Picard, and the Genome Analysis Toolkit (GATK). We developed several Perl scripts to fill the gaps between the tools, connect the tools, recognize half-length reads, select uniquely mapped reads, and compute and output data in a Microsoft Excel-recognizable format for studying the read number and the coverage per chromosome and organellar genome. In a potato 100-bp paired-end sequence file (Illumina TruSeq), approximately 6.75 % of uniquely mapped full-length reads were found to actually contain non-unique sequences at the half-length-read level. These freely available TBSPG pipelines can be used for many read-based applications, including repetitive sequence analysis and organellar genome copy number estimation.

查看译文

关键词

Illumina,Pipeline,Reads,Alignment,Half sequences,Duplicated sequences,Repetitive sequences

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要