Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: iMOKA: k-mer based software to analyze large collections of sequencing data

Fig. 1

Overview of the iMOKA algorithm. The software accepts sequencing reads in FASTQ, FASTA, BAM formats, or SRR identifiers. The k-mer count in each file is calculated and stored using a dedicated file format. k-mers are then filtered using an Entropy boosted Bayes filter with Monte Carlo cross validation to obtain the k-mers that are able to classify the input samples. These are combined into graphs and annotated using GMAP or another user-defined aligner. The final list of highly informative k-mers can be explored using the graphical interface to create classification models, inspect individual k-mers, and detect sample outliers using self-organizing maps

Back to article page