Skip to main content
Figure 6 | Genome Biology

Figure 6

From: Sequence context affects the rate of short insertions and deletions in flies and primates

Figure 6

Predicting insertion and deletion rates using the indel propensity probabilistic score. (a) The model. We trained a positional Markov model that computes the probability of a nucleotide at each position relative to the insertion/deletion point given its position, the previous nucleotide, and possibly the nucleotide l bp upstream of it (l being the event length). Model parameterization was done separately for each length, event type (insertion/deletion) and GC content region. We can score a locus for indel propensity by dividing the probability from the model by that of a similar model estimated from background sequences. (b) Predicting indel rates. The graphs summarize the results of a cross validation assay consisting of training indel propensity models on half of the aligned human chromosomes and applying them to predict insertions and deletions of various lengths in the other half of the genome. For each type of event, we show the relative increase in indel probability in the human lineage as a function of the propensity score. In all cases, the model is robustly predicting an increase of 100-1,000-fold in the indel rate for high versus low scoring loci. Similar results for chimp events are shown in Figure S5 in Additional data file 1.

Back to article page