Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: SEPATH: benchmarking the search for pathogens in human tissue whole genome sequence data leads to template pipelines

Fig. 1

Human read depletion performance. a Human read removal using BBDuK, BWA-MEM, and Kontaminant. The remaining numbers of human reads were near identical for BBDuK and Kontaminant (median values of 15,399,252 and 15,399,928 for BBDuK and Kontaminant, respectively.) All conditions retained bacterial reads with near-identical performance (Additional file 2: Figure S1). BBDuK was selected for parameter optimization (b, c). This analysis was performed on raw untrimmed reads of n = 11 simulated datasets. b, c BBDuk parameter optimization in terms of the remaining human reads (b) and remaining bacterial reads (c). Default BBDuK settings were used along with alterations of MKF and MCF parameters. The default parameters of BBDuK remove a sequencing read in the event of a single k-mer match, whereas MCF50 requires 50% of the bases in a read to be covered by reference k-mers for removal and MKF50 requires 50% of k-mers in a read to match the reference for removal. MCF50-Cancer indicates that BBDuK was ran with a database consisting of GRCh38 human reference genome and a collection of known mutations in human cancer from the COSMIC database. MCF50_Cancer_A denotes a database consisting of human reference genome 38, COSMIC cancer genes, and additional sequences from a recent African “pan-genome” study [44] (b). Default and both MCF50 parameters (with and without cancer sequences) showed the highest removal of human reads

Back to article page