AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
These algorithms often sacrifice sensitivity for fast running time. While they are successful at mapping reads from organisms that exhibit low polymorphism rates, they do not perform well at mapping reads from highly polymorphic organisms

SHRiMP: Accurate Mapping of Short Color-space Reads.

PLoS Computational Biology, no. 5 (2009)

被引用660|浏览46
EI WOS
下载 PDF 全文
引用
微博一下

摘要

The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25-70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP - the SHort Read Mapping Package: a set of algorithms and methods to map short reads ...更多

代码

数据

0
简介
  • Generation sequencing (NGS) technologies are revolutionizing the study of variation among individuals in a population.
  • The ability of sequencing platforms such as AB SOLiD and Illumina (Solexa) to sequence one billion basepairs or more in a few days has enabled the cheap re-sequencing of human genomes, with the genomes of a Chinese individual [1], a Yoruban individual [2], and matching tumor and healthy samples from a female individual [3] sequenced in the last few months.
  • While matching with up to a few differences is sufficient in these regions, these methods fail when the polymorphism level is high
重点内容
  • Generation sequencing (NGS) technologies are revolutionizing the study of variation among individuals in a population
  • One of the main application areas of Next generation sequencing (NGS) technologies is the discovery of genomic variation within a given species
  • The first step in discovering this variation is the mapping of reads sequenced from a donor individual to a known (‘‘reference’’) genome
  • These algorithms often sacrifice sensitivity for fast running time. While they are successful at mapping reads from organisms that exhibit low polymorphism rates, they do not perform well at mapping reads from highly polymorphic organisms
  • We present a novel read mapping method, SHRiMP, that can handle much greater amounts of polymorphism
  • Using Ciona savignyi as our target organism, we demonstrate that our method discovers significantly more variation than other methods
方法
  • Details of the SHRiMP Algorithm The algorithm starts with a rapid k-mer hashing step to localize potential areas of similarity between the reads and the genome.
  • For each k-mer in the genome, all of the matches of that particular kmer among the reads are found.
  • If a particular read has as many or more than a specified number of k-mer matches within a given window of the genome, the authors execute a vectorized Smith-Waterman step, described, to score and validate the similarity.
结果
  • SHRiMP was able to accurately map .46% of all reads with either 4 SNPs or 5 bp indels, despite the large number of sequencing errors in the dataset. doi:10.1371/journal.pcbi.1000386.t003.
  • SHRiMP was able to accurately map 76% of reads with 2 SNPs and 0 indels, at 84% precision, and nearly half of all reads with 2 SNPs and 3 bp indels at 74% precision
结论
  • Generation Sequencing (NGS) technologies are revolutionizing the way biologists acquire and analyze genomic data.
  • The first step in discovering this variation is the mapping of reads sequenced from a donor individual to a known (‘‘reference’’) genome.
  • Since the introduction of NGS technologies, many methods have been devised for mapping reads to reference genomes.
  • These algorithms often sacrifice sensitivity for fast running time.
  • The authors develop color-space extensions to classical alignment algorithms, allowing them to map color-space, or ‘‘dibase’’, reads generated by AB SOLiD sequencers
表格
  • Table1: Running time of SHRiMP for mapping 500,000 35 bp SOLiD C. savignyi reads to the 180 Mb reference genome on a single Core2 2.66 GHz processor
  • Table2: Mapping results for 135 million 35 bp SOLiD reads from Ciona savignyi using SHRiMP and the SOLiD mapper provided by Applied Biosystems
  • Table3: Color-space mapping accuracy of SHRiMP
  • Table4: Performance (in millions of cells per second) of the various Smith-Waterman implementations, including a regular implementation (not vectorized), Wozniak’s diagonal implementation with memory lookups, Farrar’s method and our diagonal approach without score lookups
Download tables as Excel
基金
  • Funding: This work was sponsored by Natural Sciences and Engineering Research Council (NSERC) of Canada Undergraduate Student Research Awards, Canadian Institute for Health Research (CIHR), Applied Biosystems, NSERC Discovery Grant, MITACS, and a Canada Foundation for Innovation equipment grant
引用论文
  • Wang J, Wang W, Li R, Li Y, Tian G, et al. (2008) The diploid genome sequence of an asian individual. Nature 456: 60–65.
    Google ScholarLocate open access versionFindings
  • Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59.
    Google ScholarLocate open access versionFindings
  • Ley TJ, Mardis ER, Ding L, Fulton B, Mclellan MD, et al. (2008) DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456: 66–72.
    Google ScholarLocate open access versionFindings
  • Bowtie. http://bowtie-bio.sourceforge.net.5.mapreads.http://www.solidsoftwaretools.com/gf/project/mapreads.
    Findings
  • 6. Maq. http://maq.sourceforge.net.
    Findings
  • 7. Li M, Ma B, Kisman D, Tromp J (2004) Patternhunter ii: highly sensitive and fast homology search. J Bioinform Comput Biol 2: 417–439.
    Google ScholarLocate open access versionFindings
  • 8. Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics.
    Google ScholarFindings
  • 9. Lin H, Zhang Z, Zhang MQ, Ma B, Li M (2008) Zoom! zillions of oligos mapped. Bioinformatics 24: 2431–2437.
    Google ScholarLocate open access versionFindings
  • 10. Ma B, Tromp J, Li M (2002) Patternhunter: faster and more sensitive homology search. Bioinformatics 18: 440–445.
    Google ScholarLocate open access versionFindings
  • 11. Small KS, Brudno M, Hill MM, Sidow A (2007) Extreme genomic variation in a natural population. PNAS 104: 5698–5703.
    Google ScholarLocate open access versionFindings
  • 12. Buhler J, Tompa M (2002) Finding motifs using random projections. J Comput Biol 9: 225–242.
    Google ScholarLocate open access versionFindings
  • 13. Ondov B, Varadarajan A, Passalacqua KDD, Bergman NHH (2008) Efficient mapping of applied biosystems solid sequence data to a reference genome for functional genomic applications. Bioinformatics (Oxford, England).
    Google ScholarFindings
  • 14. Rasmussen K, Stoye J, Myers EW (2006) Efficient q-gram filters for finding all ematches over a given length. J of Computational Biology 13: 296–308.
    Google ScholarLocate open access versionFindings
  • 15. Califano A, Rigoutsos I (1993) Flash: a fast look-up algorithm for string homology. Computer Vision and Pattern Recognition, 1993 Proceedings CVPR ’93, 1993 IEEE Computer Society Conference on. pp 353–359.
    Google ScholarLocate open access versionFindings
  • 16. Rognes T, Seeberg E (2000) Six-fold speed-up of smith-waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16: 699–706.
    Google ScholarLocate open access versionFindings
  • 17. Farrar M (2007) Striped smith-waterman speeds database searches six times over other simd implementations. Bioinformatics 23: 156–161.
    Google ScholarLocate open access versionFindings
  • 18. Wozniak A (1997) Using video-oriented instructions to speed up sequence comparison. Comput Appl Biosci. pp 145–150.
    Google ScholarLocate open access versionFindings
  • 19. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147: 195–197.
    Google ScholarLocate open access versionFindings
  • 20. Yanovsky V, Rumble SM, Brudno M (2008) Read mapping algorithms for single molecule sequencing data. In: WABI. Springer, volume 5251 of Lecture Notes in Computer Science, 38–49. URL http://dblp.uni-trier.de/db/conf/wabi/wabi2008.html.
    Locate open access versionFindings
  • 21. Karlin S, Altschul SF (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A 87: 2264–2268.
    Google ScholarLocate open access versionFindings
  • 22. Small KS, Brudno M, Hill MM, Sidow A (2007) A haplome alignment and reference sequence of the highly polymorphic ciona savignyi genome. Genome Biology 8: R41.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科