Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Contamination detection in genomic data: more is not enough

Fig. 2

Overview of algorithms. The algorithms are clusterized based on their operating principles, as described in the section “Overview of algorithms”. Squares on the top of the figure represent specific features of the algorithms. Non-redundant means that the software can detect contaminant genes without equivalent in the surveyed genome. Intra-species means that the algorithm can detect contamination at the species level. Inter-domain means that the algorithm can detect prokaryotic and eukaryotic contamination simultaneously. Database features show that the algorithm can use the GTDB Taxonomy and/or a moderately contaminated reference database. Expected organism indicates whether the algorithm can detect the main organism by itself and/or if the user can specify it. Additional functionalities list interesting peculiar functions of the programs, such as outputting the completeness of a genome, cleaning a genome from its contaminants, filtering reads based on their taxonomy (positive filtering), or enriching Multiple Sequence Alignments (MSAs) in orthologous sequences while controlling the taxonomy

Back to article page