EMRC data biases, normalization and CNV prediction ability. (a), (c), (e) Correlation between EMRC data and the three bias types due to GC content percentage (a), genomic mappability (c) and exon size (e). (b), (d), (f) The effect of the median normalization procedures on the removal of the three bias sources: GC content percentage correction (b), genomic mappability correction (d) and exon size correction (f). The upper border of the dashed lines is the 90th percentile of the EMRCs, while the lower border is the 10th percentile. (g), (h), (i), (j) Histograms and boxplots summarizing the capability of EMRC data to predict the exact number of DNA copies of a CNV region. (g) and (i) show the prediction capability for single-sample EMRC data, while (h) and (j) are the prediction capability for the EMRC ratio. EMRC ratios were calculated by using the NA10847 sample as control. These calculations were performed using several broad genomic regions that were previously reported to have copy numbers equal to 0, 1, 2, 3 and 4 by McCarroll et al. in the eight samples from the 1000 Genomes Project. R is the Pearson correlation coefficient.CNV, copy number variant; EMRC, exon mean read count.