Skip to main content

Table 1 Estimates of a whole-genome sequence and an RNA-seq experiment using Illumina HiSeq 2000 machine

From: The real cost of sequencing: higher than you think!

  

Whole genome sequencing

 

RNA-Seq

  

2011 cost

output

2011 time

 

2011 cost

output

2011 time

Sample collection and experimental design

from blood samples (easy to collect) to brain tissue (hard to collect)

~$100 onwards

 

from a few hours to several days

same as for whole genome sequencing

Sequencing

library preparation + running the sequencer (whole dual flow cell)

~$6500 = ~$500 + ~$6000

~380 M reads/lane; 1 individual: ~1140 M total reads (~3 lanes for a 30 × coverage); ~250 Gb (intermediate files)

~11-12 day

library preparation + running the sequencer (whole dual flow cell)

~$3300 = ~$300 + ~$3000

~380 M reads/lane

~12-14 day

 

Data storage, low-level processing

       
 

Alignment (transfer* and storing raw data + mapping)

~$40 = ~$33 + ~$7

300 Gb (BAM file)

~1/2 day *** (including transferring 250 Gb FASTQ ~7.5 hrs)

Alignment (transfer* and storing raw data + mapping)

~$5 = ~$3 + ~$2

~30 Gb (BAM); ~22 Gb (MRF)

< 2 hrs ***

 

(data transfer and storage for 10 days)*; **

~$40

 

~8.5 hrs

(data transfer and storage for 10 days)*; **

< $4

 

< 1 hr

Data reduction and management

High-level summaries***

       
 

SNP calling (compute + transfer out)

< $5 = ~$4 + ~$0.60

< 1 Gb

~3 hrs

Gene and exon

expression quantification

< $1

< 1 M

< 1 hr (1 CPU)

 

Indel calling (compute + transfer out)

< $35 = ~$32 + ~$0.60

< 1 Gb

~1 day

Isoform quantification

~$6

< 1 M

~4 h

 

SV calling (compute + transfer out)

< $35 = ~$32 + ~$0.60

< 1 Gb

~1 day

    

Downstream analyses

 

> $100 K

~310 Gb

months

 

> $100 K

~30 Gb

months

Total of sequencing, data management and reduction

~$6500

~310 Gb

~15 days

 

~3500

~30 Gb

~12-14 days

  1. aAssuming a 10 MB/s transfer rate. bThe cost of transferring 300 GB (BAM file) if the mapping is performed locally. cSixteen CPUs were used for all calculations, unless specified. BAM, Binary Sequence Alignment/Map; CPU, central processing unit; GB, gigabyte; MB, megabyte; SNP, single nucleotide polymorphism; ~, approximately. The table gives an estimation of the current costs of a whole-genome sequencing experiment and a functional genomic experiment (RNA-seq) using an Illumina HiSeq 2000 machine. The cost of sequencing is based on that reported by the Center for Cancer Research [53]. It is assumed that the same human sample is used for the genome sequence as well as for the RNA-seq experiment. In this scenario, a group of four technicians and bioinformaticians can run the entire pipeline. The costs related to data processing are estimated by considering all tasks performed in the Amazon Web Services 'cloud' environment. Pricing is based on a 'US standard' of the S3 Amazon Services (storage) [54] and a cost of $0.68 per hour (US, East Virginia) for the use of the Amazon EC2 (computation) [55] (July 2011). See Additional file 1 for a version of this table in a colored layout for easier reading.