recount3: summaries and queries for large-scale RNA-seq expression and splicing

Table 2 Monorail performance metrics run on TACC, AWS and MARCC

Metric	Human SRA TACC	Human SRA AWS	Mouse SRA TACC	Mouse SRA AWS	Human GTEx MARCC	Human TCGA MARCC	Totals
Sequencing Runs Processed	286,000	27,618	304,131	109,889	19,214	11,348	758,200
Compressed input size (TBs)	441.78	44.2	236.28	111.873	81	75	990.133
Compressed output size (TBs)	64.81	6.5	39.7	16.7	11.6	7.0	146.31
Node hours (NHs)	10,133	798	8,179	5,967	2421	1467	28,965
NHs per sequencing run	0.035	0.029	0.027	0.054	0.126	0.129	0.038
NHs per compressed input TB	22.9	18.1	34.6	53.3	29.9	19.6	29.3
Sequencing runs per NH	28	35	37	18	8	8	26
Compressed input TB per NH	0.044	0.055	0.029	0.019	0.033	0.051	0.034

Statistics for GTEx and TCGA were extrapolated from a subset of each project (9277, 1567 samples respectively). GTEx output was increased by keeping whole BAM files for a subset of the samples. These numbers tally the number of run accessions processed, which can exceed the numbers in Table 1 due to some runs being processed multiple times, and due to runs that were later removed for QC or metadata reasons. Missing from this table are several thousand SRA human run accessions that were analyzed on MARCC but whose log files were discarded

ISSN: 1474-760X