Skip to main content
Figure 2 | Genome Biology

Figure 2

From: Estimating enrichment of repetitive elements from high-throughput sequence data

Figure 2

Phylogenetic analysis of repeat enrichment patterns. (a) Aiming to provide most informative estimates of repeat enrichment that can be attained for a given dataset, repeats are organized into a phylogenetic tree on the basis of read set similarity to maximize the number of uniquely assignable reads that can be used for enrichment estimation. The estimates are illustrated on the resulting tree branches using colors. The nodes of the tree represent sets of repetitive sequences. The gray labels show the fraction of total number of ChIP reads that map to a given set of sequences (node) that can be associated uniquely. The tree is constructed in a way that maximizes the number of additional uniquely associated reads gained at each step. For instance, considering repeats A and B together allows 1,000 uniquely associated ChIP reads to be to utilized for enrichment estimation, even though the sum of the reads uniquely associated with repeat A and repeat B separately is 600. The 400 additional reads are those that map to both A and B repeats, but do not map to any other repeats (in the same way the discarded read in Figure 1 maps to both C and D). The length of each branch corresponds to the number of the unique reads gained using a log scale when collapsing sequences of the descendant nodes into a single set. The statistical significance of the observed enrichment or depletion is shown as a Z-score (green numbers). Large positive Z-score values denote statistically significant enrichment (Z-score of 3.1 corresponds to a P-value of 10-3), and negative values correspond to significant depletion. The Z-score magnitude is capped at 10. (b) A fragment of the enrichment phylogeny of the Repbase repeat types for H3K9me3 enrichment in mES cells. The example illustrates grouping of repeats from ERV-K class, all of which, with the exception of RLTR19-int, are highly enriched for the H3K9me3 modification. Additional examples are shown in Figure S4 of Additional file 1. (c) A small fragment of H3K9me3 enrichment phylogeny for the individual instances of the intracisternal A particle (IAP) interspersed repeats (IAPEz-int). The fragment clusters instances located within a specific region on chromosome X due to a high degree of sequence identity between them. While the lack of discriminating sequences precludes evaluation of each instance individually, considering nearly identical instances together allows the demonstration of statistically significant enrichment of this localized group of instances or the H3K9me3 mark in mES cells. LTR, long terminal repeat.

Back to article page