Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: StrainGE: a toolkit to track and characterize low-abundance strains in complex microbial communities

Fig. 1

StrainGE is a toolkit to track, characterize and compare low-abundance strains in metagenomic samples. a Overview of StrainGE pipeline. StrainGST uses a database of high quality reference genomes to select those most similar to strains present in a metagenomic sample. StrainGR further characterizes SNVs and gaps that differ between references selected by StrainGST and the actual strain present in the sample. b At each iteration, StrainGST scores each reference strain by comparing the k-mer profile of the reference to the sample k-mers, reporting the reference closest to the highest abundant strain in the sample. The k-mers in the reported reference are removed from the sample and the process is repeated to search for lower-abundance strains, until there are insufficient k-mers. c StrainGR uses a short read alignment-based approach to characterize variation (SNVs and gaps) between the reference(s) identified by StrainGST and the metagenomic sample. Regions shared between the concatenated genomes (gray shaded areas) are detected and excluded from variant calling. Alleles are classified as “strong” or “weak.” After applying rigorous QC metrics, positions in the reference are classified as (i) “reference confirmed” (light gray; a single strong reference allele), (ii) “SNV” (red; a single strong alternative allele), or (iii) “multi-allelic” (blue; multiple strong alleles present, e.g. the blue allele together with the reference allele in gray). The position with a strong reference allele and a weak alternative allele (green; an allele with only limited support in the reads) is classified as “reference confirmed” because only the reference allele is considered strong at that position. The “callable” genome is defined as all positions within the reference with at least one strong allele call

Back to article page