Quality Assessment of High-Throughput DNA Sequencing Data via Range Analysis.
IWBBIO(2018)
摘要
In the recent literature, there appeared a number of studies for the quality assessment of sequencing data. These efforts, to a great extent, focused on reporting the statistical parameters regarding the distribution of the quality scores and/or the base-calls in a FASTQ file. We investigate another dimension for the quality assessment motivated by the fact that reads including long intervals having fewer errors improve the performances of the post-processing tools in the downstream analysis. Thus, the quality assessment procedures proposed in this study aim to analyze the segments on the reads that are above a certain quality. We define an interval of a read to be of desired–quality when there are at most k quality scores less than or equal to a threshold value v, for some k and v provided by the user. We present the algorithm to detect those ranges and introduce new metrics computed from their lengths. These metrics include the mean values for the longest, shortest, average, cubic average, coefficient variation, and segment numbers of the fragment lengths in each read that are appropriate according to the k and v input parameters. We also provide a new software tool QASDRA for quality assessment of sequencing data via range analysis, which is available at https://github.com/ali-cp/QASDRA.git. QASDRA creates the quality assessment report of an input FASTQ file according to the user-specified k and v parameters. It also has the capabilities to filter out the reads according to the metrics introduced.
更多查看译文
关键词
DNA sequencing data quality assessment, High-throughput DNA sequencing, Quality score
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络