Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Metalign: efficient alignment-based metagenomic profiling via containment min hash

Fig. 1

Metalign overview. a The input to Metalign is sequencing reads and a reference database. b The pre-filtering stage, based on an implementation of the theoretical concept of containment min hash [14], quickly estimates the percentage of k-mers in each reference genome that are also in the reads. Metalign then selects a small “subset” database consisting of reference genomes above a certain containment percentage threshold. c Metalign then performs alignment between the reads and the reference genomes in the subset database, outputting a profile in the standardized, community-driven format used by OPAL [17] and CAMI [7]. Applying Metalign to in vitro mock community data compared with naive alignment without pre-filtering reduced runtime from 513 to 5 min and false-positive genera from 542 to 2

Back to article page