Skip to main content
Figure 5 | Genome Biology

Figure 5

From: Assembly of a phased diploid Candida albicansgenome facilitates allele-specific measurements and provides a simple model for repeat and indel structure

Figure 5

Indels are clustered throughout the genome. (A) A representative multikilobase span, where ‘X’ indicates an indel and dashes signify non-polymorphic repeat sequences. (B) The number of ‘–‘ characters between each indel (‘X’) was counted across the genome and compiled into a histogram in purple. In gray, the exponential distribution expected based on the observed indel probability and assuming random dispersion of indels. Inset: the analogous plot for ‘dense’ regions identified by the hidden Markov model (HMM). (C) (i) Schematic of the HMM used to distinguish indel-dense from indel-sparse regions. (ii) Fractional share of total indels (left) and number of bases in the genome (right) present in ‘dense’ (blue) and ‘sparse’ (red) regions. (D) Relative enrichment of three different sequence features between ‘dense’ and ‘sparse’ regions. Error bars indicate ±S.E.M. across regions, propagated through division. (E) The indel concentration, measured as indels-per-repeat sequence, in 7.5 kb windows centered at replication origins was calculated as a function of replication-origin offset (that is, 0 kb is the native origin location). Step size is 1 kb, and the average value across three adjacent windows is plotted. (F) The total number of repeat sequences present in non-overlapping 1 kb windows centered at replication origins.

Back to article page