AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
In a cloud computing environment these pipelines can be packaged into virtual machine images and stored in a way that lets anyone copy them, run them and customize them for their own needs, avoiding the software installation and configuration complexities

The case for cloud computing in genome informatics

Genome Biology, no. 5 (2010): 207-207

被引用574|浏览16
WOS
下载 PDF 全文
引用
微博一下

摘要

ABSTRACT: With DNA sequencing now getting cheaper more quickly than data storage or computation, the time may have come for genome informatics to migrate to the cloud.

代码

数据

简介
  • The impending collapse of the genome informatics ecosystem Since the 1980s, the authors have had the great fortune to work in a comfortable and effective ecosystem for the production and consumption of genomic information (Figure 1).
  • The archival databases and the value-added genome distrib­ u­ tors did not need to worry about running out of disk storage space because the long-term trends allowed them to upgrade their capacity faster than the world’s sequencing labs could update theirs.
重点内容
  • The impending collapse of the genome informatics ecosystem Since the 1980s, we have had the great fortune to work in a comfortable and effective ecosystem for the production and consumption of genomic information (Figure 1)
  • Moore’s Law states that the number of transistors that can be placed on an integrated circuit board is increasing exponentially, with a doubling time of roughly 18 months
  • The way it works is that a service provider puts up the capital expenditure of creating an extremely large compute and storage farm with all the frills needed to maintain an operation of this size, including a dedicated system administration staff, storage redundancy, data centers distributed to strategically placed parts of the world, and broadband network connectivity
  • Instead of there being separate copies of genome datasets stored at diverse locations and groups copying the data to their local machines in order to work with them, most datasets are stored in the cloud as virtual disks and databases
  • In a cloud computing environment these pipelines can be packaged into virtual machine images and stored in a way that lets anyone copy them, run them and customize them for their own needs, avoiding the software installation and configuration complexities
  • If cloud computing is to work for genomics, the service providers will have to offer some flexibility in how large datasets get into the system
结果
  • For the value-added genome integrators to do their magic with the data, they must download it from the archival databases across the internet and store copies in their local storage systems.
  • In the traditional economic model of computation, customers purchase server, storage and networking hardware, configure it the way they need, and run software on it.
  • The way it works is that a service provider puts up the capital expenditure of creating an extremely large compute and storage farm with all the frills needed to maintain an operation of this size, including a dedicated system administration staff, storage redundancy, data centers distributed to strategically placed parts of the world, and broadband network connectivity.
  • Instead of there being separate copies of genome datasets stored at diverse locations and groups copying the data to their local machines in order to work with them, most datasets are stored in the cloud as virtual disks and databases.
  • Web services that run on top of these datasets, including both the primary archives and the value-added integrators, run as virtual machines within the cloud.
  • Using the facilities provided by the service provider, they configure a virtual machine image that contains the software they wish to run, launch as many copies as they need, mount the disks and databases containing the public datasets they need, and do the analysis.
  • Cloud computing creates a new niche in the eco­ system for genome software developers to package their work in the form of virtual machines.
  • In a cloud computing environment these pipelines can be packaged into virtual machine images and stored in a way that lets anyone copy them, run them and customize them for their own needs, avoiding the software installation and configuration complexities.
结论
  • You can establish an account with Amazon Web Services or one of the other commercial vendors, launch a virtual machine instance from a wide variety of generic and bioinformatics-oriented images and attach any one of several large public genome-oriented datasets.
  • If cloud computing is to work for genomics, the service providers will have to offer some flexibility in how large datasets get into the system.
总结
  • The impending collapse of the genome informatics ecosystem Since the 1980s, the authors have had the great fortune to work in a comfortable and effective ecosystem for the production and consumption of genomic information (Figure 1).
  • The archival databases and the value-added genome distrib­ u­ tors did not need to worry about running out of disk storage space because the long-term trends allowed them to upgrade their capacity faster than the world’s sequencing labs could update theirs.
  • For the value-added genome integrators to do their magic with the data, they must download it from the archival databases across the internet and store copies in their local storage systems.
  • In the traditional economic model of computation, customers purchase server, storage and networking hardware, configure it the way they need, and run software on it.
  • The way it works is that a service provider puts up the capital expenditure of creating an extremely large compute and storage farm with all the frills needed to maintain an operation of this size, including a dedicated system administration staff, storage redundancy, data centers distributed to strategically placed parts of the world, and broadband network connectivity.
  • Instead of there being separate copies of genome datasets stored at diverse locations and groups copying the data to their local machines in order to work with them, most datasets are stored in the cloud as virtual disks and databases.
  • Web services that run on top of these datasets, including both the primary archives and the value-added integrators, run as virtual machines within the cloud.
  • Using the facilities provided by the service provider, they configure a virtual machine image that contains the software they wish to run, launch as many copies as they need, mount the disks and databases containing the public datasets they need, and do the analysis.
  • Cloud computing creates a new niche in the eco­ system for genome software developers to package their work in the form of virtual machines.
  • In a cloud computing environment these pipelines can be packaged into virtual machine images and stored in a way that lets anyone copy them, run them and customize them for their own needs, avoiding the software installation and configuration complexities.
  • You can establish an account with Amazon Web Services or one of the other commercial vendors, launch a virtual machine instance from a wide variety of generic and bioinformatics-oriented images and attach any one of several large public genome-oriented datasets.
  • If cloud computing is to work for genomics, the service providers will have to offer some flexibility in how large datasets get into the system.
引用论文
  • Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DJ. GenBank. Nucleic Acids Res 2005, 33:D34-D38.
    Google ScholarLocate open access versionFindings
  • Brooksbank C, Cameron G, Thornton J. The European Bioinformatics Institute’s data resources. Nucleic Acids Res 2010, 38:D17-D25.
    Google ScholarLocate open access versionFindings
  • Sugawara H, Ogasawara O, Okubo K, Gojobori T, Tateno Y. DDBJ with new system and face. Nucleic Acids Res 2008, 36:D22-24.
    Google ScholarLocate open access versionFindings
  • Shumway M, Cochrane G, Sugawara H. Archiving next generation sequencing data. Nucleic Acids Res 2010, 38:D870-D871.
    Google ScholarLocate open access versionFindings
  • Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Edgar R. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 2009, 37:D885-D890.
    Google ScholarLocate open access versionFindings
  • Kapushesky M, Emam I, Holloway E, Kurnosov P, Zorin A, Malone J, Rustici G, Williams E, Parkinson H, Brazma A. Gene expression atlas at the European bioinformatics institute. Nucleic Acids Res 2010, 38:D690-D698.
    Google ScholarLocate open access versionFindings
  • Flicek P, Aken BL, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Gräf S, Haider S, Hammond M, Howe K, Jenkinson A, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Koscielny G, Kulesha E, Lawson D, Longden I, Massingham T, McLaren W, et al.. Ensembl’s 10th year. Nucleic Acids Res 2010, 38:D557-D662.
    Google ScholarLocate open access versionFindings
  • Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith KE, Rosenbloom KR, Raney BJ, Pohl A, Pheasant M, Meyer LR, Learned K, Hsu F, Hillman-Jackson J, Harte RA, Giardine B, Dreszer TR, Clawson H, Barber GP, Haussler D, Kent WJ. The UCSC Genome Browser database: update 2010. Nucleic Acids Res 2010, 38:D613-D619.
    Google ScholarLocate open access versionFindings
  • Taylor J, Schenck I, Blankenberg D, Nekrutenko A. Using Galaxy to perform large-scale interactive data analyses. Curr Protoc Bioinformatics 2007, 10:10.5.
    Google ScholarLocate open access versionFindings
  • Engel SR, Balakrishnan R, Binkley G, Christie KR, Costanzo MC, Dwight SS, Fisk DG, Hirschman JE, Hitz BC, Hong EL, Krieger CJ, Livstone MS, Miyasato SR, Nash R, Oughtred R, Park J, Skrzypek MS, Weng S, Wong ED, Dolinski K, Botstein D, Cherry JM. Saccharomyces Genome Database provides mutant phenotype data. Nucleic Acids Res 2010, 38:D433-D436.
    Google ScholarLocate open access versionFindings
  • Harris TW, Antoshechkin I, Bieri T, Blasiar D, Chan J, Chen WJ, De La Cruz N, Davis P, Duesbury M, Fang R, Fernandes J, Han M, Kishore R, Lee R, Müller HM, Nakamura C, Ozersky P, Petcherski A, Rangarajan A, Rogers A, Schindelman G, Schwarz EM, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Yook K, Durbin R, Stein LD, Spieth J, Sternberg PW. WormBase: a comprehensive resource for nematode research. Nucleic Acids Res 2010, 38:D463-D467.
    Google ScholarLocate open access versionFindings
  • Fey P, Gaudet P, Curk T, Zupan B, Just EM, Basu S, Merchant SN, Bushmanova YA, Shaulsky G, Kibbe WA, Chisholm RL: dictyBase - a Dictyostelium bioinformatics resource update. Nucleic Acids Res 2009, 37:D515-D519.
    Google ScholarLocate open access versionFindings
  • Liang C, Jaiswal P, Hebbard C, Avraham S, Buckler ES, Casstevens T, Hurwitz B, McCouch S, Ni J, Pujar A, Ravenscroft D, Ren L, Spooner W, Tecle I, Thomason J, Tung CW, Wei X, Yap I, Youens-Clark K, Ware D, Stein L. Gramene: a growing plant comparative genomics resource. Nucleic Acids Res 2008, 36:D947-D953.
    Google ScholarLocate open access versionFindings
  • Moore GE. Cramming more components onto integrated circuits. Electronics 1965, 38:4-7.
    Google ScholarLocate open access versionFindings
  • Walter C. Kryder’s Law. Sci Am August 2005, 293:32-33.
    Google ScholarLocate open access versionFindings
  • Tehrani R. As we may communicate. TMCNet, 2000. [http://www.tmcnet. com/articles/comsol/0100/0100pubout.htm] 17. Internet Archive [http://www.archive.org/] 18.
    Findings
  • Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, et al.. Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005, 437:376-380.
    Google ScholarLocate open access versionFindings
  • 19. Bennett S. Solexa Ltd. Pharmacogenomics 2004, 5:433-438.
    Google ScholarLocate open access versionFindings
  • 20. McKernan KJ, Peckham HE, Costa GL, McLaughlin SF, Fu Y, Tsung EF, Clouser CR, Duncan C, Ichikawa JK, Lee CC, Zhang Z, Ranade SS, Dimalanta ET, Hyland FC, Sokolsky TD, Zhang L, Sheridan A, Fu H, Hendrickson CL, Li B, Kotler L, Stuart JR, Malek JA, Manning JM, Antipova AA, Perez DS, Moore MP, Hayashibara KC, Lyons MR, Beaudoin RE, et al.. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res 2009, 19:1527-1541.
    Google ScholarLocate open access versionFindings
  • 21. Illumina [http://www.illumina.com/] 22. Pacific Biosciences [http://www.pacificbiosciences.com/] 23. Helicos Biosciences Corporation [http://www.helicosbio.com/] 24. Ion Torrent [http://www.iontorrent.com/] 25. The 1000 Genomes Project [http://www.1000genomes.org/] 26. ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, et al.: Identification and analysis of functional elements in 1%of the human genome by the ENCODE pilot project. Nature 2007, 447:799-816.
    Locate open access versionFindings
  • 27. Celniker SE, Dillon LA, Gerstein MB, Gunsalus KC, Henikoff S, Karpen GH, Kellis M, Lai EC, Lieb JD, MacAlpine DM, Micklem G, Piano F, Snyder M, Stein L, White KP, Waterston RH; modENCODE Consortium: Unlocking the secrets of the genome. Nature 2009, 459:927-930.
    Google ScholarLocate open access versionFindings
  • 28. Cancer Genome Atlas Research Network: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 2008, 455:1061-1068.
    Google ScholarLocate open access versionFindings
  • 29. International Cancer Genome Consortium: International network of cancer genome projects. Nature 2010, 464:993-998.
    Google ScholarLocate open access versionFindings
  • 30. Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature 2007, 449:804-810.
    Google ScholarLocate open access versionFindings
  • 31. Human Microbiome Project [http://nihroadmap.nih.gov/hmp/] 32. Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science 2007, 316:1497-1502.
    Locate open access versionFindings
  • 33. El-Maarri O. Methods: DNA methylation. Adv Exp Med Biol 2003, 544:197-204.
    Google ScholarLocate open access versionFindings
  • 34. Li G, Fullwood MJ, Xu H, Mulawadi FH, Velkov S, Vega V, Ariyaratne PN, Mohamed YB, Ooi HS, Tennakoon C, Wei CL, Ruan Y, Sung WK. ChIA-PET tool for comprehensive chromatin interaction analysis with paired-end tag sequencing. Genome Biol 2010, 11:R22.
    Google ScholarLocate open access versionFindings
  • 35. VMware [http://www.vmware.com/] 36. KVM [http://www.linux-kvm.org/page/Main_Page] 37. Amazon Elastic Compute Cloud [http://aws.amazon.com/ec2] 38. The Rackspace Cloud [http://www.rackspacecloud.com/] 39. Flexiant [http://www.flexiant.com/] 40. Galaxy [http://main.g2.bx.psu.edu/] 41. Bioconductor [http://www.bioconductor.org/]
    Findings
  • 42. The R Project for Statistical Computing [http://www.r-project.org/] 43. GBrowse [http://gmod.org/wiki/Gbrowse] 44. Bioperl [http://www.bioperl.org/wiki/Main_Page] 45. JCVI Cloud BioLinux [http://www.jcvi.org/cms/research/projects/jcvi-cloudbiolinux/overview/] 46. Amazon Cloud Instance [http://genomewiki.ucsc.edu/index.php/ Amazon_Cloud_Instance] 47. Eucalyptus [http://open.eucalyptus.com/] 48. Open Cloud Consortium [http://opencloudconsortium.org/] 49. Google and IBM Announce University Initiative to Address Internet-Scale
    Locate open access versionFindings
  • Computing Challenges. Press release 2007. [http://www.google.com/intl/en/press/pressrel/20071008_ibm_univ.html] 50. National Science Foundation Awards Millions to Fourteen Universities for Cloud Computing Research [http://www.nsf.gov/news/news_summ.jsp?cntn_id=114686] 51.
    Locate open access versionFindings
  • Armbrust M, Fox A, Griffith R, Joseph AD, Katz RH, Konwinski A, Lee G, Patterson DA, Rabkin A, Stoica I, Zaharia M. Above the clouds: a Berkeley view of cloud computing. Technical Report No. UCB/EECS-2009-28. Electrical Engineering and Computer Sciences University of California at Berkeley [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf ] 52.
    Findings
  • Kouranov A, Xie L, de la Cruz J, Chen L, Westbrook J, Bourne PE, Berman HM. The RCSB PDB information portal for structural genomics. Nucleic Acids Res 2006, 34:D302-D305.
    Google ScholarLocate open access versionFindings
  • 53. Madrigal A. Google to host terabytes of open-source science data. Wired Science 2008. [http://www.wired.com/wiredscience/2008/01/google-to-provi/] doi:10.1186/gb-2010-11-5-207
    Locate open access versionFindings
  • Cite this article as: Stein LD: The case for cloud computing in genome informatics. Genome Biology 2010, 11:207.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科