Skip to main content
Figure 5 | Genome Biology

Figure 5

From: Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics

Figure 5

Churchill enables population-scale whole human genome sequence analysis. Churchill was used to analyze 1,088 of the low-coverage whole-genome samples that were included in ‘phase 1’ of the 1000 Genomes Project (1KG). Raw sequence data for the entire population were used to generate a single multi-sample VCF in 7 days using 400 AWS EC2 instances (cc2.8xlarge spot instances). The resulting Churchill filtered VCF (green) was then compared to the 1KG Consortium’s VCF (red), with Churchill calling 41.2 million variants and the 1KG VCF file containing 39.7 million. The two VCF file sets had a total of 34.4 million variant sites in common. (A) There were 33.2 million SNPs called in common, with validation rates against known SNPs being highly similar: 52.8% (Churchill) and 52.4% (1KG). (B) Churchill called three-fold more indels, of which 19.5% were known compared with 12.5% in the 1KG indel set. The indels unique to Churchill have a seven-fold higher rate of validation with known variants than those unique to 1KG. (C) Minor allele frequencies were compared for the 34.3 million variants with the same minor allele and a density binned scatter plot was produced (scaled from low (light blue) to high (purple) density frequencies). The results from Churchill and the original 1KG analysis demonstrated highly concordant minor allele frequencies (R2 = 0.9978, P-value <2.2e-16).

Back to article page