Inferred human-mouse similarity distributions for aligned genomic blocks. (a) Standard normal distributions were calculated as an estimation of human-mouse similarity in the neutral genomic fraction (solid), and in the selected genomic fraction (dashed), assuming a mean percentage identity of 66.7% and 84.7% respectively. The graphs represent analyses for different block sizes: 50 bp, 100 bp, and 200 bp. Calculations are based on the normal approximation to binomial distribution with n = block size and p = mean percentage identity. This provides the probability distribution of the number of matches in a pairwise alignment of length n. Each alignment position is considered an independent Bernoulli trial, where p is the probability for an identical nucleotide in the two aligned sequences. All frequencies are normalized to a sum of 1, with the selected population being 1/8 of the total. Compare to [10, 39] for whole-genome analysis of actual data, and to Figure 4 for specified genomic regions. Note that the standard deviation of real data is larger than that computed for the binomial model with independent sequence positions. In addition, while the model assumes a fixed probability for nucleotide identity (p), the real substitution rate varies locally across the genome. (b) Logarithmic transform of the distributions presented in (a). The frequency of 100% identical 100 bp blocks is 10-12 for the neutral portion, and approximately 2 × 10-6 for the selected portion. Given that around 1.2 × 109 bases are aligned (1.2 × 107 blocks), about 20 blocks are expected to be of 100% identity among the selected DNA segments, and much fewer than one (10-5) of the neutral ones. These values are a lower bound for the actual number of such blocks in the genome, because they relate to non-overlapping windows.