Skip to main content

Advertisement

Figure 2 | Genome Biology

Figure 2

From: Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics

Figure 2

Churchill scales efficiently, enabling complete secondary analysis to be achieved in less than two hours. The capability of Churchill, GATK-Queue and HugeSeq to scale analysis beyond a single compute node was evaluated. (A) Fold speedup as a function of the number of cores used was assessed across a cluster of four Dell® R815 servers with Churchill (green), GATK-Queue (blue), HugeSeq (red) and serial analysis (yellow). For comparison, the linear speedup (grey) and that predicted by Amdahl’s law (purple) assuming a one-hour sequential time are also included [11]. Churchill’s scalability closely matches that predicted by Amdahl’s law, achieving in excess of a 13-fold speedup between 8 and 192 cores. In contrast, both HugeSeq and GATK-Queue showed modest improvements in speed between 8 and 24 cores (2-fold), with a maximal 3-fold speedup being achieved with 48 cores, and no additional increase in speed beyond 48 cores. (B) Timing results for different steps of the Churchill pipeline were assessed with increasing numbers of cores. Complete human genome analysis was achieved in three hours using an in-house cluster with 192 cores and in 100 minutes at the Ohio Supercomputer Center (Glenn Cluster utilizing 700 cores). Results were confirmed using both the Pittsburgh Supercomputing Center and Amazon Web Services EC2.

Back to article page