Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: GUNC: detection of chimerism and contamination in prokaryotic genomes

Fig. 1

GUNC quantifies chimerism in prokaryotic genomes. a Genome contamination may originate in vitro (e.g., from culture media, laboratory equipment or kits, index hopping during multiplexed sequencing) or in silico (contig misassembly, erroneous binning). Genomes are represented as circular chromosomes, contigs as sequences of genes (dots). b Two types of genome contamination can be distinguished operationally: redundant contamination by surplus genomic material (“more of the same”) and non-redundant contamination by non-overlapping fragments from distantly related lineages (“something new,” e.g., novel or distant orthologs). Different single-copy marker genes (SCGs) are shown as solid shapes, other genes as dashed circles; colors indicate different source lineages. c GUNC workflow. For a given query genome, genes are called using prodigal, then mapped to the GUNC reference database (based on proGenomes 2.1) using diamond to compute GUNC scores and to generate interactive Sankey diagrams to visualize genome taxonomic composition. GUNC quantifies genome chimerism and reference representation across taxonomic levels. Clade separation scores (CSS) are high if gene classification to distinct lineages (represented by different colors) follows contig boundaries. Reference representation scores (RRS) are high if genes map closely and consistently into the GUNC reference space. The top example illustrates a chimeric genome with good reference representation, the bottom example a non-contaminated genome that is not well represented in the GUNC reference

Back to article page