SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences
CoRR(2023)
摘要
Computational complexity is a key limitation of genomic analyses. Thus, over
the last 30 years, researchers have proposed numerous fast heuristic methods
that provide computational relief. Comparing genomic sequences is one of the
most fundamental computational steps in most genomic analyses. Due to its high
computational complexity, optimized exact and heuristic algorithms are still
being developed. We find that these methods are highly sensitive to the
underlying data, its quality, and various hyperparameters. Despite their wide
use, no in-depth analysis has been performed, potentially falsely discarding
genetic sequences from further analysis and unnecessarily inflating
computational costs. We provide the first analysis and benchmark of this
heterogeneity. We deliver an actionable overview of the 11 most widely used
state-of-the-art methods for comparing genomic sequences. We also inform
readers about their advantages and downsides using thorough experimental
evaluation and different real datasets from all major manufacturers (i.e.,
Illumina, ONT, and PacBio). SequenceLab is publicly available at
https://github.com/CMU-SAFARI/SequenceLab.
更多查看译文
关键词
genomic sequences,comprehensive benchmark,computational methods
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要