Skip to main content
Figure 2 | Genome Biology

Figure 2

From: Extending reference assembly models

Figure 2

(a) Alignment of decoy-specific reads to GRCh38. To identify decoy-specific reads, the following read sets were used: /panfs/traces01/compress/1KG.p2/NA19200.mapped.ILLUMINA.bwa.YRI.low_coverage.20120522.8levels.csra and /panfs/traces01/compress/1KG.p2/NA12878.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.8levels.csra. Using SRPRISM [26] (with parameters [p] (force paired/unpaired search): false; [n] (maximum number of allowed errors): 6; [M] (maximum allowed memory usage): 2048.) reads were extracted with MAPQ >20 that uniquely aligned to SN:hs37d5 (the decoy). As a control, these reads were aligned to a target set comprising the GRCh37 primary assembly unit (GCF_000001305.13), chr. MT (GCF_000006015.1) and the decoy. Collectively, this target set is referred to as GRCh37pmd. To assess capture of the decoy sequence in the updated assembly, the reads were aligned to a target set comprising the GRCh38 primary assembly (GCF_000001305.14) and chr. MT (GCF_000006015.1) (referred to as GRCh38pm) or the full GRCh38 assembly (GCF_000001405.26). A captured read was defined as one that aligned only to the decoy (by SRPRISM) when using GRCh37pmd as the target set, and also aligned to GRCh38pm or full. (b) Incorporation of decoy sequences in GRCh38. Upper panel: graphical view of NT_187681.1, a GRCh38 alternative loci scaffold. Blue bars are assembly components; gene annotations are shown in green. Two components (AC226006.2 and AC208587.4, highlighted in red) in this alt scaffold are fosmids from which decoy sequences were derived (AC208587.4:7500–12511, AC226006.2:5581–8770, AC226006.2:20483–21626). The decoy sequences represent variants from the chromosome. The alignment of the decoy fragments and GRCh38 chr. 11 to the alternative scaffold are shown as gray bars (red tick, mismatch; blue tick, deletion; thin red line, insertion). Note that the decoy fragments align to regions that are missing from the reference chromosome. Lower panel: graphical view of NC_000011.10, GRCh38 chr.11. An assembly component (AC239832.2, highlighted in red) added to this region for GRCh38 is a fosmid from which decoy sequence was derived (AC239832.3:6602–10993, AC239832.3:24434–27364). In this case, the decoy adds exonic sequence from MUC2 that is missing from component AC139749.4. The alignments of the decoy fragments and NT_187681.1 to the alternative scaffold are shown as gray bars (red tick, mismatch; blue tick, deletion; thin red line, insertion).

Back to article page