Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: DNA polymerase stalling at structured DNA constrains the expansion of short tandem repeats

Fig. 3

Hairpin-like and tetrahelical structures arise from distinct STR family with distinct properties. a Distribution together with the statistical relevance of averaged computed stall scores at 0.5 min for the 964 unique single-stranded STR motifs when 48 nucleotides long together with their structures predicted using a supervised machine learning algorithm. The Q values are computed by combining P values associated to the statistical differences in stall scores at all time points observed for a given STR motif and the random control sequences of varying GC content (see the “Methods” section for details), and reflect that DNA polymerase stalling at STRs is structure rather than sequence dependent. Distribution of stall scores at b 0.5 min and c 30 min associated with the different structural classes of STRs of any length, highlighting the transient or persistent nature of DNA polymerase stalling at hairpin-like and tetrahelical STR respectively. d Fraction of predicted structured STRs as a function of STR length highlighting the length-dependent structure formation. e Distribution of stall scores at 30 min associated with each structural class and in the function of STR length. The reported structures are those predicted when the STRs are 72 nt long. Centre lines denote medians, boxes span the interquartile range and whiskers extend beyond the box limits by 1.5 times the interquartile range. P values for the comparison of the distributions were calculated using Kolmogorov–Smirnov tests, n.sP > 0.05, *P ≤ 0.05, **P < 0.01, ***P < 0.001. f Hierarchical clustering of STR single-stranded motifs by sequences using cosine distances as a measure of sequence similarity together with heatmaps reporting GC content and sequence entropy. The leaves of the dendrogram are coloured according to the predicted structures of the STRs. Such representation highlights two distinct families related to G4 (red) and i-motifs (yellow) forming STRs. Hairpin-like STRs (blue) are dispersed over the dendrogram showing a higher degree of sequence diversity

Back to article page