Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: GUNC: detection of chimerism and contamination in prokaryotic genomes

Fig. 3

Extensive undetected chimerism in public genome databases and large-scale MAG datasets. a Cumulative plots summarizing genome quality for various genome reference and MAG datasets. The y-axis shows the fraction of genomes passing GUNC filtering at increasing stringency (x-axis), up to the default CSS threshold of 0.45, conservatively ignoring species-level scores. Note that the Almeida, Pasolli, and Nayfach sets were pre-filtered using variations of the MIMAG medium criterion based on CheckM estimates. GTDB, Genome Taxonomy Database; GMGC, Global Microbial Gene Catalog. b Example of detected contamination in an isolate-derived reference genome for which around one fifth of genes were assigned to a different phylum, scattered across hundreds of small contigs. c Example of detected contamination in a MAG for which genes assigned to two major different phyla were well separated into distinct contigs. d Cumulative plots summarizing the quality of species-level genome bins (SGBs) defined by Pasolli et al. [13]. Lines indicate the fraction of SGBs (y-axis) containing at least one or exclusively chimeric genomes at increasingly stringent GUNC cutoffs (x-axis) conservatively ignoring species-level scores. For both series, intervals correspond to edge scenarios in which genomes with limited reference representation are either conservatively ignored (treated as non-chimeric, upper lines) or aggressively removed (lower lines); the true fraction of chimeric SGBs likely falls in between. e Differential filtering of MAGs in the GMGC set based on CheckM contamination (< 5%), CheckM completeness (> 90%), and GUNC (CSS < 0.45, ignoring species-level scores)

Back to article page