Table 1 Alignment-free sequence comparison tools available for next-generation sequencing data analysis

From: Alignment-free sequence comparison: benefits, applications, and tools

Category Analysis Tool Primary features Implementation Reference URL
Mapping Transcript quantification kallisto Transcript abundance quantification from RNA-seq data (uses pseudoalignment for rapid determination of read compatibility with targets) Software (C++) [69]
Sailfish Estimation of isoform abundances from reference sequences and RNA-seq data (k-mer based) Software (C++) [67]
Salmon Quantification of the expression of transcripts using RNA-seq data (uses k-mers) [70]
RNA-Skim RNA-seq quantification at transcript-level (partitions the transcriptome into disjoint transcript clusters; uses sig-mers, a special type of k-mers) Software (C++) [68]
Variant calling ChimeRScope Fusion transcript prediction using gene k-mers profiles of the RNA-seq paired-end reads Software (Java) [74]
FastGT Genotyping of known SNV/SNP variants directly from raw NGS sequence reads by counting unique k-mers Software (C) [73]
Phy-Mer Reference-independent mitochondrial haplogroup classifier from NGS data (k-mer based) Software (Python) [157]
LAVA Genotyping of known SNPs (dbSNP and Affymetrix's Genome-Wide Human SNP Array) from raw NGS reads (k-mer based) Software (C) [71]
MICADo Detection of mutations in targeted third-generation NGS data (can distinguish patients’ specific mutations; algorithm uses k-mers and is based on colored de Bruijn graphs) Software (Python) [72]
General mapper Minimap Lightweight and fast read mapper and read overlap detector (uses the concept of “minimazers”, a special type of k-mers) Software (C) [77]
Assembly De novo genome assembly MHAP Produces highly continuous assembly (fully resolved chromosome arms) from third-generation long and noisy reads (10 kbp) using a dimensionality reduction technique MinHash Software (Java) [76]
Miniasm Assembler of long noisy reads (SMRT, ONT) using the Overlap-Layout Consensus (OLC) approach without the necessity of an error correction stage (uses minimap) Software (C) [77]
LINKS Scaffolding genome assembly with error-containing long sequence (e.g., ONT or PacBio reads, draft genomes) Software (Perl) [75]
Read clustering afcluster Clustering of reads from different genes and different species based on k-mer counts Software (C++) [158]
QCluster Clustering of reads with alignment-free measures (k-mer based) and quality values Software (C++) [159]
Reads error correction Lighter Correction of sequencing errors in raw, whole genome sequencing reads (k-mer based) Software (C++) [94]
QuorUM Error corrector for Illumina reads using k-mers Software (C++) [93]
Trowel Software (C++) [95]
Metagenomics Assembly-free phylogenomics AAF Phylogeny reconstruction directly from unassembled raw sequence data from whole genome sequencing projects; provides bootstrap support to assess uncertainty in the tree topology (k-mer based) Software (Python) [78]
kSNP v3 Reference-free SNP identification and estimation of phylogenetic trees using SNPs (based on k-mer analysis) Software (C) [80, 81]
NGS-MC Phylogeny of species based on NGS reads using alignment-free sequence dissimilarity measures d2* and d2 S under different Markov chain models (using k-words) R package [79, 160]
Species identification/taxonomic profiling CLARK Taxonomic classification of metagenomic reads to known bacterial genomes using k-mer search and LCA assignment Software (C++) [84]
FOCUS Reports organisms present in metagenomic samples and profiles their abundances (uses composition-based approach and non-negative least squares for prediction) Web service Software (Python) [161]
GSM Estimation of abundances of microbial genomes in metagenomic samples (k-mer based) Software (Go) [162]
Mash Species identification using assembled or unassembled Illumina, PacBio, and ONT data (based on MinHash dimensionality-reduction technique) Software (C++) [163]
Kraken Taxonomic assignment in metagenome analysis by exact k-mer search; LCA assignment of short reads based on a comprehensive sequence database Software (C++) [83]
LMAT Assignment of taxonomic labels to reads by k-mers searches in precomputed database Software (C++/Python) [82]
stringMLST k-mer-based tool for MLST directly from the genome sequencing reads Software (Python) [86]
Taxonomer k-mer-based ultrafast metagenomics tool for assigning taxonomy to sequencing reads from clinical and environmental samples Web service [164]
Other d2-tools Word-based (k-tuple) comparison (pairwise dissimilarity matrix using d2S measure) of metatranscriptomic samples from NGS reads Software (Python/R) [56, 165]
VirHostMatcher Prediction of hosts from metagenomic viral sequences based on ONF using various distance measures (e.g., d2) Software (C++) [153]
   MetaFast Statistics calculation of metagenome sequences and the distances between them based on assembly using de Bruijn graphs and Bray–Curtis dissimilarity measure Software (Java) [166]
  1. The up-to-date list of currently available programs can be found at Accessed 23 August 2017
  2. LCA lowest common ancestor, NGS next-generation sequencing, SNP single-nucleotide polymorphism, SNV single-nucleotide variant