SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data

GENOME BIOLOGY(2021)

引用 12|浏览23
暂无评分
摘要
Background There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. Results We propose a novel computational method, SequencErr, to address this challenge by measuring the base correspondence between overlapping regions in forward and reverse reads. An analysis of 3777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~ 10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates > 100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and > 90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tile. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and removal of outlier error-prone tiles improves sequencing accuracy. We demonstrate that SequencErr can reveal novel insights relative to the popular quality control method FastQC and achieve a 10-fold lower error rate than popular error correction methods including Lighter and Musket. Conclusions Our study reveals novel insights into the nature of DNA sequencing errors incurred on DNA sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets.
更多
查看译文
关键词
Sequencer/instrument error,Error suppression,DNA sequencing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要