Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Rapid and sensitive detection of genome contamination at scale with FCS-GX

Fig. 3

FCS-GX detection of contamination in NCBI databases. a Distribution of the proportion of contaminated sequence per genome detected by FCS-GX in the NCBI GenBank database. Genome counts (frequency) were computed in 5% intervals. b Aggregate length of total genome sequence (solid line) and contaminated sequence detected by FCS-GX (dashed line) in the NCBI GenBank database from 2017 to 2023. c Percentage of contaminated sequence detected by FCS-GX (dashed line) in the NCBI GenBank database from 2017 to 2023, i.e., the quotient of the contaminant amount divided by the total amount displayed in b. See Additional file 1: Table S9 for supporting numerical data. d Percentage of contaminated genomes in GenBank. Total numbers of screened genomes are shown for six taxonomic kingdom groups: Metazoa (animals), Fungi, Viridiplantae (green plants), Other eukaryotes, Bacteria, and Archaea. Within each group, genomes are placed into four bins corresponding to the amount of contamination per genome and percentages are calculated for the count of genomes in each bin divided by total screened genomes. e Aggregate contamination lengths identified in genomes from six kingdom groups. Colors of grid squares indicate aggregate contamination lengths from eight sources (six kingdoms, plus virus and synthetic) that correspond to percentages of total assembly length for each GenBank kingdom group. See Additional file 1: Table S8 for supporting numerical kingdom contamination summary data

Back to article page