Skip to main content


Fig. 3 | Genome Biology

Fig. 3

From: miRTrace reveals the organismal origins of microRNA sequencing data

Fig. 3

Prevalence, causes, and effects of cross-contamination in miRNA-seq data. a Percentages of public mouse, nematode, and fly datasets that contain primate sequences (putative human contamination). b Percentages of primate sequences in putatively contaminated datasets (considering only clade-specific sequences). The white dot indicates the median value, the thick black line indicates the range from 25 to 75% percentiles, and the whole gray area indicates the range from 5 to 95% percentiles. c Human and mouse samples were profiled in parallel with Illumina sequencing to detect sources of cross-contamination (top). Using the Illumina default settings for sample assignment, rodent contamination was detected in all nine human samples. Each bar represents one sample, and the numbers in each bar indicate the number of contaminating sequences (bars below). d Using more stringent settings for sample assignment, only sequences with perfect matches to known sample indices were retained (top). This computational step removed most rodent contaminations (bottom). e In an additional filtering step, sequences with inconsistent indices were discarded (top), removing contaminations completely from six out of nine human samples (bottom). f Mouse samples in silico contaminated with controlled abundances of human or fly sequences ranging from 0 to 100% (top). Principal component analyses (PCA) show how the overall miRNA composition of the sample changes with increasing levels of contamination (bottom). g Effects of the contaminations from (f) on gene differential expression (DE) analyses. Sensitivity, specificity, and accuracy (in shades of brown) are given as fractions, while the false positive rates (in red) are absolute numbers. h Effects of contamination on the prediction of novel miRNAs

Back to article page