From: Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics

Churchill enables rapid secondary analysis and variant calling with GATK HaplotypeCaller using cloud computing resources. Analysis of raw sequence data for a single human genome sequence dataset (30× coverage) was compared using Churchill and bcbio-nextgen, with both pipelines utilizing BWA-MEM for alignment and GATK HaplotypeCaller for variant detection and genotyping. (A) CPU utilization on a single r3.8xlarge AWS EC2 instance (32 cores) was monitored throughout the analysis process and demonstrated that Churchill improved resource utilization (94%) when compared with bcbio-nextgen (57%), enabling the entire analysis to be completed in under 12 hours with a single instance. (B) Unlike bcbio-nextgen, Churchill enables all steps of the analysis process to be efficiently scaled across multiple compute nodes, resulting in significantly reduced run times. With 16 AWS EC2 instances the entire analysis could be completed in 104 minutes, with the variant calling and genotyping with GATK HaplotypeCaller stage taking only 24 minutes of the total run time.

