One-parameter model reveals strong relationship between indel rate and repeat-sequence abundance. (A) Indel rate as a function of repeat length is plotted, with coloring indicating the inserted or deleted nucleotides as shown in the legend. Repeat length is the average of the ‘reference’ and ‘indel’ read lengths; thus, for single-base indels, repeat length is ‘x.5’ for integer values of x. (B-E) Gray dotted lines show repeat-sequence abundance as a function of length for A:T homopolymers (B, E) G:C homopolymers (C), and AT:TA dyad-repeats (D). The colored lines show the lowest-error model fit based on the indel rates in (A), with error and α values specified. To prevent overfitting at low repeat-length values, error is calculated as the average squared deviation in log space, not linear space. (F) Abundance of A:T homopolymers as a function of length in various indicated organisms. A histogram was generated for each species independently; to facilitate comparisons among species, the data were then normalized such that the abundance at length 3 is 1.0 and then scaled - to adjust for differences in genomic A:T content - such that the abundance at length 6 is 0.75. The dashed line indicates where α = 0.