Skip to main content
Fig. 5 | Genome Biology

Fig. 5

From: DNA polymerase stalling at structured DNA constrains the expansion of short tandem repeats

Fig. 5

DNA polymerase stalling at DNA structures predicts abundance, length and stability of STRs in the human genome. a Abundance, i.e. the number of occurrences, of the 501 unique double-stranded STR motifs in the human genome when binned by their average stall scores at 2 min when 48 nt long. b Range of double-stranded STR motifs length when binned by their average stall scores at 30 min when 24 nt long. The plot reports the top 1000 longest repeat instances from each bin. Similar analyses related to other eukaryotic genomes are reported in Additional file 1: Figure S12. c Distribution of expansion/contraction rates of the AAAT, CGG and AGGGG motifs which are characterised by low, medium and high stall scores at 30 min when 24 nt long respectively. Shaded boxes span from the 5th to the 95th percentiles of the distributions. d Expansion/contraction rates (μ) and e length constraints (β) associated with the double-stranded STR motifs when binned by their average stall scores at 30 min when 24 nt long. The plots show a correlation between the length stability and the ability of a STR to stall polymerase, which is due to an increased length constraint reflected by higher β values. f SNP density at positions surrounding STRs when binned according to their stability (top) or stall scores at 30 min when 24 nt long (bottom: low σ < 0.33, medium 0.33 ≤ σ < 0.66, high σ ≥ 0.66). SNP density at individual positions was computed by dividing the number of STR marked by a SNP at a given position by the total number of considered STRs. Sawtooth profiles in SNP density is due to the underlying periodicity of the repeats (see Additional file 1: Figure S14a). Centre lines denote medians, boxes span the interquartile range and whiskers extend beyond the box limits by 1.5 times the interquartile range. Shaded boxes highlight the range of the medians. Reported P values assess SNP enrichment in the vicinity of STR and were computed using two-sided Fisher tests considering the average frequency of SNPs at position − 1 and + 1 from the start and the end of the repeats

Back to article page