From: FORGe: prioritizing variants for graph genomes

Results from NA12878 simulation for GRCh37 Chromosome 9. 100 nt unpaired reads were simulated from Chromosome 9 with NA12878’s variants included. FORGe and HISAT2 created and indexed augmented reference genomes with various variant sets. Besides the Pop Cov and Hybrid rankings, we also included a strategy that gave variants random ranks (“Random”). a and d show the fraction of reads aligned. b and e show the fraction that aligned correctly to the simulated point of origin. c plots a parametric curve of the fraction of reads with a correct alignment (vertical) versus the fraction with an incorrect alignment (horizontal). Lines follow measurements made over a range of fractions of SNVs, with points for 0%, 2%, 4%, 6%, 8%, 10%, 15%, and 20% up to 100% in 10 point increments. The diamond labeled HISAT2 auto is an augmented genome produced using HISAT2’s pruning scripts. The diamond labeled Major allele ref is a linear reference with all positions set to the most frequent allele. Other diamonds indicate the SNV fraction maximizing yx, where y is the fraction of reads aligned correctly and x is the fraction aligned incorrectly. The HISAT2 and Major allele diamonds are excluded from panels a, b, and f because there is no clear way to measure the fraction of variants included by these methods. The black filled circle and square in panel c represent measurements when 0% and 100% of variants are included, respectively

