Size distribution of genetic variants. (a) A non-redundant size spectrum of SNP and CNV (including indels) and a breakdown of the proportion of gain to loss. The indel/CNV dataset consists of variants detected by assembly comparison, mate-pair, split-read, NimbleGen 42 M comparative genomic hybridization (CGH) and Agilent 24 M. The results show that the number and the size of variants are negatively correlated. Although the proportions of gains and losses are quite equal across the size spectrum, there are some deviations. Losses are more abundant in the 1 to 10 kb range, and this is mainly due to the inability of the 2-kb and 10-kb library mate-pair clones to detect insertions larger than their clone size. The opposite is seen for large events, where duplications are more common than deletions, which may be due to both biological and methodological biases. The increase in the number of events near 300 bp and 6 kb can be explained by short interspersed nuclear element (SINE) and long interspersed nuclear element (LINE) indels, respectively. The general peak around 10 kb corresponds to the interval with the highest clone coverage. (b) Size distribution of gains (insertions and duplications) highlighting the detection range of each methodology. The split-read method is designed to capture insertions from 11 bp to the size of a Sanger-based sequence read (approximately 1 kb). There is no insertion detected in the size range between the 2 kb and 10 kb library using the mate-pair approach. Furthermore, due to technical limitations, large gains (≥ 100,000 bp) cannot be identified with the sequencing-based approaches, while these are readily identified by microarrays. (c) Size distribution of deletions.