Lincoln Stein's excellent article on Cloud Computing in the latest issue of Genome Biology is a timely and insightful analysis of the promise of cloud computing for bioinformatics. As founders of a cloud-based DNA sequence analysis service, DNAnexus.com, we wholeheartedly agree that cloud bioinformatics is here to stay, as it translates cost-effectiveness and scalability into real-world time and resource savings for anyone dealing with large genomics datasets.
A key parameter for the viability of the cloud model for bioinformatics, for academic and commercial efforts alike, is whether standard networks are fast enough to support the upload of the large data files produced by sequencing machines. Dr. Stein's calculations suggest that network speeds are the major obstacle to widespread adoption. He states:
"For genomics, the biggest obstacle to moving to the cloud may well be network bandwidth. A typical research institution will have network bandwidth of about a gigabit/second (roughly 125 megabytes/second). On a good day this will support sustained transfer rates of 5 to 10 megabytes/second across the internet. Transferring a 100 gigabyte next-generation sequencing data file across such a link will take about a week in the best case."
We were struck by these numbers. For us, uploading a 1 gigabyte file, which corresponds to a typical single-lane fastq file from an Illumina GAIIx machine, takes little more than a minute. That 100 gigabytes should take a week seemed inconsistent with our first-hand experience. Indeed, upon further examination, we noticed that the calculations are off by a factor of about 60. To keep with the quoted example, transferring a 100 gigabyte next-generation sequencing data file across a network that supports real speeds of 10 megabytes per second will take 10,000 seconds, or about 3 hours, which is about 1/60 of a week.
In our experience, network speed is the first issue potential users mention, usually in a skeptical way, when they learn about DNAnexus. We hope that our clarification, in the context of Dr. Stein's article, will help to dissipate this skepticism by making users realize that bandwidth is not an issue in most settings, where a modest number of sequencing machines feed data over a standard network.
Sincerely,
Andreas Sundquist, Principal Founder and CEO, DNAnexus
Serafim Batzoglou, Co-founder, DNAnexus, and Associate Professor of Computer Science, Stanford University
Arend Sidow, Co-founder, DNAnexus, and Associate Professor of Pathology and of Genetics, Stanford University
Competing interests
None declared
Response to Andreas Sundquist
Lincoln Stein, Ontario Institute for Cancer Research
28 September 2010
I wish to thank Andreas Sundquist and colleagues for identifying a careless and significant error in my calculations of network transfer time for a 100 gigabyte sequencing file. I have repeated the calculation and confirm Sundquist's estimates.
Competing interests
none
community and cloud computing
Dawn Field, Centre for Ecology and Hydrology, UK
20 January 2011
Great paper and love the idea of the genomic informatics world as an ecosystem. Yes, it will certainly change and perhaps the biggest benefit will be more of a chance to share expertise/tools/data. The cloud should help naturally support collaboration and community-led projects. For example, great to see JCVI Cloud Bio-Linux cited - we provide NEBC Bio-Linux* upon which the cloud image is built: http://nebc.nox.ac.uk/biolinux.html Very happy to see our project/packages being put into the cloud by a third party. This is the vision of the new ecosystem.
Open software for biologists: from famine to feast Dawn Field1, Bela Tiwari1, Tim Booth1, Stewart Houten1, Dan Swan2, Nicolas Bertrand3 & Milo Thurston1
Abstract Developing and deploying specialized computing systems for specific research communities is achievable, cost effective and has wide-ranging benefits.
Network transfer miscalculation
5 August 2010
Genome Biology ¿ Letter to Editor
Dear Editor,
Lincoln Stein's excellent article on Cloud Computing in the latest issue of Genome Biology is a timely and insightful analysis of the promise of cloud computing for bioinformatics. As founders of a cloud-based DNA sequence analysis service, DNAnexus.com, we wholeheartedly agree that cloud bioinformatics is here to stay, as it translates cost-effectiveness and scalability into real-world time and resource savings for anyone dealing with large genomics datasets.
A key parameter for the viability of the cloud model for bioinformatics, for academic and commercial efforts alike, is whether standard networks are fast enough to support the upload of the large data files produced by sequencing machines. Dr. Stein's calculations suggest that network speeds are the major obstacle to widespread adoption. He states:
"For genomics, the biggest obstacle to moving to the cloud may well be network bandwidth. A typical research institution will have network bandwidth of about a gigabit/second (roughly 125 megabytes/second). On a good day this will support sustained transfer rates of 5 to 10 megabytes/second across the internet. Transferring a 100 gigabyte next-generation sequencing data file across such a link will take about a week in the best case."
We were struck by these numbers. For us, uploading a 1 gigabyte file, which corresponds to a typical single-lane fastq file from an Illumina GAIIx machine, takes little more than a minute. That 100 gigabytes should take a week seemed inconsistent with our first-hand experience. Indeed, upon further examination, we noticed that the calculations are off by a factor of about 60. To keep with the quoted example, transferring a 100 gigabyte next-generation sequencing data file across a network that supports real speeds of 10 megabytes per second will take 10,000 seconds, or about 3 hours, which is about 1/60 of a week.
In our experience, network speed is the first issue potential users mention, usually in a skeptical way, when they learn about DNAnexus. We hope that our clarification, in the context of Dr. Stein's article, will help to dissipate this skepticism by making users realize that bandwidth is not an issue in most settings, where a modest number of sequencing machines feed data over a standard network.
Sincerely,
Andreas Sundquist, Principal Founder and CEO, DNAnexus
Serafim Batzoglou, Co-founder, DNAnexus, and Associate Professor of Computer Science, Stanford University
Arend Sidow, Co-founder, DNAnexus, and Associate Professor of Pathology and of Genetics, Stanford University
Competing interests
None declared
Response to Andreas Sundquist
28 September 2010
I wish to thank Andreas Sundquist and colleagues for identifying a careless and significant error in my calculations of network transfer time for a 100 gigabyte sequencing file. I have repeated the calculation and confirm Sundquist's estimates.
Competing interests
none
community and cloud computing
20 January 2011
Great paper and love the idea of the genomic informatics world as an ecosystem. Yes, it will certainly change and perhaps the biggest benefit will be more of a chance to share expertise/tools/data. The cloud should help naturally support collaboration and community-led projects. For example, great to see JCVI Cloud Bio-Linux cited - we provide NEBC Bio-Linux* upon which the cloud image is built: http://nebc.nox.ac.uk/biolinux.html Very happy to see our project/packages being put into the cloud by a third party. This is the vision of the new ecosystem.
* Nature Biotechnology 24, 801 - 803 (2006)
doi:10.1038/nbt0706-801
Open software for biologists: from famine to feast
Dawn Field1, Bela Tiwari1, Tim Booth1, Stewart Houten1, Dan Swan2, Nicolas Bertrand3 & Milo Thurston1
Abstract
Developing and deploying specialized computing systems for specific research communities is achievable, cost effective and has wide-ranging benefits.
Competing interests
None declared