Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB.

APPLIED AND ENVIRONMENTAL MICROBIOLOGY, no. 7 (2006): 5069-5072

引用5626|浏览72
WOS
下载 PDF 全文
引用
微博一下

摘要

A 16S rRNA gene database (http://greengenes.bl.gov) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were...更多

代码

数据

0
简介
  • A 16S rRNA gene database addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies.
  • Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates.
  • Because the rate of production of 16S small-subunit rRNA gene sequence records for uncultured organisms exceeds the rate of production for their cultured counterparts, taxonomic placement of sequences lags behind.
重点内容
  • Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates
  • Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria
  • We found that there is incongruent taxonomic nomenclature among curators even at the phylum level
  • We plan to develop and implement a number of community curation tools. This should allow the user community to actively participate in improving the quality of the Greengenes database and should ensure that time-consuming manual improvements of sequence and sequence-associated data, including taxonomic corrections, are propagated for the benefit of the whole community
  • Five curation tools that should capture manual improvements are in development: (i) improvements in individual sequence alignments, manual verification of putative chimeras, recruitment of novel lineages to the Core Set, corrections in the Greengenes description, and (v) updating taxonomic group names
结果
  • For a small sample of 1,399 sequence records from known phyla, it was estimated that 3% of the public data might contain chimeras [2].
  • Greengenes maintains a consistent multiple-sequence alignment (MSA) of both archaeal and bacterial 16S small-subunit rRNA genes to facilitate taxonomic placement.
  • To illustrate the utility of the Greengenes data assembly process and to examine the validity of prokaryotic candidate phyla, the authors aligned and chimera checked more than 90,000 public 16S small-subunit rRNA gene sequences.
  • Three studies have submitted more than 1,000 full-length clones; the authors expect the number of large 16S small-subunit rRNA gene surveys to increase due to the availability and falling cost of high-throughput sequencing.
  • Thousands of full-length 16S small-subunit rRNA gene-annotated GenBank records were only partially aligned using NAST.
  • Chimeras are a fundamental problem when they are used as templates with probe selection software, a growing concern with the recent increase in 16S small-subunit rRNA gene microarray probe development [3, 8, 11].
  • Chimera test results from Greengenes allow greater control over input to probe selection software, should aid in avoiding artificial terminal restriction fragment length polymorphism pattern predictions from ARB-compatible TRF-CUT [25], and can increase the accuracy of sampling rarefaction curves [26].
  • Greengenes offers annotated, chimera-checked, full-length 16S rRNA gene sequences in standard alignment formats.
结论
  • This should allow the user community to actively participate in improving the quality of the Greengenes database and should ensure that time-consuming manual improvements of sequence and sequence-associated data, including taxonomic corrections, are propagated for the benefit of the whole community.
  • Five curation tools that should capture manual improvements are in development: (i) improvements in individual sequence alignments, manual verification of putative chimeras, recruitment of novel lineages to the Core Set, corrections in the Greengenes description, and (v) updating taxonomic group names.
  • For a suggested alignment alteration, the submitted sequence must (i) match the existing sequence, preserve the location of highly conserved positions in the 16S rRNA gene, and record the curator information as part of the update transaction.
基金
  • Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates
  • Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria
  • Comparative analysis of 16S small-subunit rRNA genes is commonly used to survey the constituents of microbial communities , to infer bacterial and archaeal evolution , and to design monitoring and analysis tools, such as microarrays
  • 43% of full-length 16S smallsubunit rRNA gene records in the GenBank database are amalgamated into the pseudodivisions “environmental samples” and “unclassified.” Annotation styles are inconsistent, creating barriers for computational categorization of biological sources
  • Future versions of NAST could be altered to allow alignment extensions across regions having low template similarity or to allow candidates to be aligned in sections using divergent templates. Both of these options may allow a greater abundance of chimeric data to be imported into Greengenes but perhaps would capture novel phyla from the public repositories
引用论文
  • Updated information and services can be found at: http://aem.asm.org/content/72/7/5069
    Findings
  • These include: This article cites 29 articles, 20 of which can be accessed free at: http://aem.asm.org/content/72/7/5069#ref-list-1
    Findings
  • Information about commercial reprint orders: http://journals.asm.org/site/misc/reprints.xhtml To subscribe to to another ASM Journal go to:http://journals.asm.org/site/subscriptions/
    Locate open access versionFindings
  • APPLIED AND ENVIRONMENTAL MICROBIOLOGY, July 2006, p. 5069–5072 0099-2240/06/$08.00ϩ0 doi:10.1128/AEM.03006-05
    Findings
  • Vol. 72, No. 7
    Google ScholarFindings
  • A 16S rRNA gene database (http://greengenes.lbl.gov)addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3%of environmental sequences and in 0.2%of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.
    Findings
  • To illustrate the utility of the Greengenes data assembly process and to examine the validity of prokaryotic candidate phyla, we aligned and chimera checked more than 90,000 public 16S small-subunit rRNA gene sequences. Taxonomic classifications from the major curators were used when such classifications were available. Sequence data were imported from NCBI for complete or nearly complete gene sequences (length, Ͼ1,250 nucleotides) deposited as of 2 April 2006. Alignment of both archaeal and bacterial sequences was performed with the NAST aligner (8) against a “Core Set” of templates selected from a phylogenetically broad collection (16). The resulting MSA was formatted so that each sequence occupied a consistent 7,682 characters or 4,182 characters; the latter allowed
    Google ScholarLocate open access versionFindings
  • FIG. 1. 16S rRNA gene sequencing projects that produced more than 200 full-length records. All projects were submitted to GenBank between October 2000 and February 2006. Sequences were generated from gastrointestinal (GI), soil (SO), vaginal (VG), aerosol (AR), culture collection (CC), insect (IN), water (WA), waste treatment (WT), and fecal (FC) sources as indicated on the x axis. The projects are ordered by sequence count.
    Google ScholarLocate open access versionFindings
  • The computational infrastructure was provided in part by the Virtual Institute for Microbial Stress and Survival (http://VIMSS.lbl.gov)supported by the U.S. Department of Energy Office of Science Office of Biological and Environmental Research Genomics:GTL Program and the Natural and Accelerated Bioremediation Research Program through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the U.S. Department of Energy. Web application development was funded in part by the Department of Homeland Security under grant HSSCHQ04X00037.
    Findings
  • 1. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410.
    Google ScholarLocate open access versionFindings
  • 2. Ashelford, K. E., N. A. Chuzhanova, J. C. Fry, A. J. Jones, and A. J. Weightman. 2005. At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Appl. Environ. Microbiol. 71:7724–7736.
    Google ScholarLocate open access versionFindings
  • 3. Ashelford, K. E., A. J. Weightman, and J. C. Fry. 2002. PRIMROSE: a computer program for generating and estimating the phylogenetic range of 16S rRNA oligonucleotide probes and primers in conjunction with the RDP-II database. Nucleic Acids Res. 30:3481–3489.
    Google ScholarLocate open access versionFindings
  • 4. Brodie, E., S. Edwards, and N. Clipson. 2002. Bacterial community dynamics across a floristic gradient in a temperate upland grassland ecosystem. Microb. Ecol. 44:260–270.
    Google ScholarLocate open access versionFindings
  • 5. Castiglioni, B., E. Rizzi, A. Frosini, K. Sivonen, P. Rajaniemi, A. Rantala, M. A. Mugnai, S. Ventura, A. Wilmotte, C. Boutte, S. Grubisic, P. Balthasart, C. Consolandi, R. Bordoni, A. Mezzelani, C. Battaglia, and G. De Bellis. 2004. Development of a universal microarray based on the ligation detection reaction and 16S rRNA gene polymorphism to target diversity of cyanobacteria. Appl. Environ. Microbiol. 70:7161–7172.
    Google ScholarLocate open access versionFindings
  • 6. Clamp, M., J. Cuff, S. M. Searle, and G. J. Barton. 2004. The Jalview Java alignment editor. Bioinformatics 20:426–427.
    Google ScholarLocate open access versionFindings
  • 7. Cole, J. R., B. Chai, R. J. Farris, Q. Wang, S. A. Kulam, D. M. McGarrell, G. M. Garrity, and J. M. Tiedje. 2005. The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 33:D294–D296.
    Google ScholarLocate open access versionFindings
  • 8. DeSantis, T. Z., I. Dubosarskiy, S. R. Murray, and G. L. Andersen. 2003. Comprehensive aligned sequence construction for automated design of effective probes (CASCADE-P) using 16S rDNA. Bioinformatics 19:1461– 1468.
    Google ScholarFindings
  • 9. DeSantis, T. Z., P. Hugenholtz, K. Keller, E. L. Brodie, N. Larsen, Y. M. Piceno, R. Phan, and G. L. Andersen. NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res., in press.
    Google ScholarLocate open access versionFindings
  • 10. DeSantis, T. Z., C. E. Stone, S. R. Murray, J. P. Moberg, and G. L. Andersen. 2005. Rapid quantification and taxonomic classification of environmental DNA from both prokaryotic and eukaryotic origins using a microarray. FEMS Microbiol. Lett. 245:271–278.
    Google ScholarLocate open access versionFindings
  • 11. Emrich, S. J., M. Lowe, and A. L. Delcher. 2003. PROBEmer: a web-based software tool for selecting optimal DNA oligos. Nucleic Acids Res. 31:3746– 3750.
    Google ScholarLocate open access versionFindings
  • 12. Felsenstein, J. 1989. PHYLIP—Phylogeny Inference Package (version 3.65). Cladistics 5:164–166.
    Google ScholarLocate open access versionFindings
  • 13. Harris, J. K., S. T. Kelley, and N. R. Pace. 2004. New perspective on uncultured bacterial phylogenetic division OP11. Appl. Environ Microbiol. 70:845–849.
    Google ScholarLocate open access versionFindings
  • 14. Harris, J. K., S. T. Kelley, G. B. Spiegelman, and N. R. Pace. 2003. The genetic core of the universal ancestor. Genome Res. 13:407–412.
    Google ScholarLocate open access versionFindings
  • 15. Huber, T., G. Faulkner, and P. Hugenholtz. 2004. Bellerophon: a program to detect chimeric sequences in multiple sequence alignments. Bioinformatics 20:2317–2319.
    Google ScholarLocate open access versionFindings
  • 16. Hugenholtz, P. 2002. Exploring prokaryotic diversity in the genomic era. Genome Biol. 3:1–8.
    Google ScholarLocate open access versionFindings
  • 17. Kelly, J. J., S. Siripong, J. McCormack, L. R. Janus, H. Urakawa, S. El Fantroussi, P. A. Noble, L. Sappelsa, B. E. Rittmann, and D. A. Stahl. 2005. DNA microarray detection of nitrifying bacterial 16S rRNA in wastewater treatment plant samples. Water Res. 39:3229–3238.
    Google ScholarLocate open access versionFindings
  • 18. Kumar, S., K. Tamura, and M. Nei. 2004. MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform. 5:150–163.
    Google ScholarLocate open access versionFindings
  • 19. Lane, D. J., A. P. Harrison, Jr., D. Stahl, B. Pace, S. J. Giovannoni, G. J. Olsen, and N. R. Pace. 1992. Evolutionary relationships among sulfur- and iron-oxidizing eubacteria. J. Bacteriol. 174:269–278.
    Google ScholarLocate open access versionFindings
  • 20. Lehner, A., A. Loy, T. Behr, H. Gaenge, W. Ludwig, M. Wagner, and K. H. Schleifer. 2005. Oligonucleotide microarray for identification of Enterococcus species. FEMS Microbiol. Lett. 246:133–142.
    Google ScholarLocate open access versionFindings
  • 21. Ludwig, W., O. Strunk, R. Westram, L. Richter, H. Meier, Yadhukumar, A. Buchner, T. Lai, S. Steppi, G. Jobb, W. Forster, I. Brettske, S. Gerber, A. W. Ginhart, O. Gross, S. Grumann, S. Hermann, R. Jost, A. Konig, T. Liss, R. Lussmann, M. May, B. Nonhoff, B. Reichel, R. Strehlow, A. Stamatakis, N. Stuckmann, A. Vilbig, M. Lenke, T. Ludwig, A. Bode, and K. H. Schleifer. 2004. ARB: a software environment for sequence data. Nucleic Acids Res. 32:1363–1371.
    Google ScholarFindings
  • 22. Maidak, B. L., J. R. Cole, T. G. Lilburn, C. T. Parker, Jr., P. R. Saxman, R. J. Farris, G. M. Garrity, G. J. Olsen, T. M. Schmidt, and J. M. Tiedje. 2001. The RDP-II (Ribosomal Database Project). Nucleic Acids Res. 29: 173–174.
    Google ScholarLocate open access versionFindings
  • 23. Pace, N. R. 1997. A molecular view of microbial diversity and the biosphere. Science 276:734–740.
    Google ScholarLocate open access versionFindings
  • 24. Radosevich, J. L., W. J. Wilson, J. H. Shinn, T. Z. DeSantis, and G. L. Andersen. 2002. Development of a high-volume aerosol collection system for the identification of air-borne micro-organisms. Lett. Appl. Microbiol. 34: 162–167.
    Google ScholarLocate open access versionFindings
  • 25. Ricke, P., S. Kolb, and G. Braker. 2005. Application of a newly developed ARB software-integrated tool for in silico terminal restriction fragment length polymorphism analysis reveals the dominance of a novel pmoA cluster in a forest soil. Appl. Environ. Microbiol. 71:1671–1673.
    Google ScholarLocate open access versionFindings
  • 26. Schloss, P. D., and J. Handelsman. 2004. Status of the microbial census. Microbiol. Mol. Biol. Rev. 68:686–691.
    Google ScholarLocate open access versionFindings
  • 27. Stamatakis, A., T. Ludwig, and H. Meier. 2005. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21:456–463.
    Google ScholarLocate open access versionFindings
  • 28. Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X Windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876–4882.
    Google ScholarLocate open access versionFindings
  • 29. Webster, G., C. J. Newberry, J. C. Fry, and A. J. Weightman. 2003. Assessment of bacterial community structure in the deep sub-seafloor biosphere by 16S rDNA-based techniques: a cautionary tale. J. Microbiol. Methods 55: 155–164.
    Google ScholarLocate open access versionFindings
  • 30. Wilson, K. H., W. J. Wilson, J. L. Radosevich, T. Z. DeSantis, V. S. Viswanathan, T. A. Kuczmarski, and G. L. Andersen. 2002. High-density microarray of small-subunit ribosomal DNA probes. Appl. Environ. Microbiol. 68:2535–2541.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn