Skip to main content

Table 1 Alignment-free sequence comparison tools available for next-generation sequencing data analysis

From: Alignment-free sequence comparison: benefits, applications, and tools

Category Analysis Tool Primary features Implementation Reference URL
Mapping Transcript quantification kallisto Transcript abundance quantification from RNA-seq data (uses pseudoalignment for rapid determination of read compatibility with targets) Software (C++) [69] https://pachterlab.github.io/kallisto/
Sailfish Estimation of isoform abundances from reference sequences and RNA-seq data (k-mer based) Software (C++) [67] http://www.cs.cmu.edu/~ckingsf/software/sailfish/
Salmon Quantification of the expression of transcripts using RNA-seq data (uses k-mers) [70] https://combine-lab.github.io/salmon/
RNA-Skim RNA-seq quantification at transcript-level (partitions the transcriptome into disjoint transcript clusters; uses sig-mers, a special type of k-mers) Software (C++) [68] http://www.csbio.unc.edu/rs/
Variant calling ChimeRScope Fusion transcript prediction using gene k-mers profiles of the RNA-seq paired-end reads Software (Java) [74] https://github.com/ChimeRScope/ChimeRScope/wiki
FastGT Genotyping of known SNV/SNP variants directly from raw NGS sequence reads by counting unique k-mers Software (C) [73] https://github.com/bioinfo-ut/GenomeTester4/
Phy-Mer Reference-independent mitochondrial haplogroup classifier from NGS data (k-mer based) Software (Python) [157] https://github.com/danielnavarrogomez/phy-mer
LAVA Genotyping of known SNPs (dbSNP and Affymetrix's Genome-Wide Human SNP Array) from raw NGS reads (k-mer based) Software (C) [71] http://lava.csail.mit.edu/
MICADo Detection of mutations in targeted third-generation NGS data (can distinguish patients’ specific mutations; algorithm uses k-mers and is based on colored de Bruijn graphs) Software (Python) [72] http://github.com/cbib/MICADo
General mapper Minimap Lightweight and fast read mapper and read overlap detector (uses the concept of “minimazers”, a special type of k-mers) Software (C) [77] https://github.com/lh3/minimap
Assembly De novo genome assembly MHAP Produces highly continuous assembly (fully resolved chromosome arms) from third-generation long and noisy reads (10 kbp) using a dimensionality reduction technique MinHash Software (Java) [76] https://github.com/marbl/MHAP
Miniasm Assembler of long noisy reads (SMRT, ONT) using the Overlap-Layout Consensus (OLC) approach without the necessity of an error correction stage (uses minimap) Software (C) [77] https://github.com/lh3/miniasm
LINKS Scaffolding genome assembly with error-containing long sequence (e.g., ONT or PacBio reads, draft genomes) Software (Perl) [75] https://github.com/warrenlr/LINKS/
Read clustering afcluster Clustering of reads from different genes and different species based on k-mer counts Software (C++) [158] https://github.com/luscinius/afcluster
QCluster Clustering of reads with alignment-free measures (k-mer based) and quality values Software (C++) [159] http://www.dei.unipd.it/~ciompin/main/qcluster.html
Reads error correction Lighter Correction of sequencing errors in raw, whole genome sequencing reads (k-mer based) Software (C++) [94] https://github.com/mourisl/Lighter
QuorUM Error corrector for Illumina reads using k-mers Software (C++) [93] https://github.com/gmarcais/Quorum
Trowel Software (C++) [95] https://sourceforge.net/projects/trowel-ec/
Metagenomics Assembly-free phylogenomics AAF Phylogeny reconstruction directly from unassembled raw sequence data from whole genome sequencing projects; provides bootstrap support to assess uncertainty in the tree topology (k-mer based) Software (Python) [78] https://github.com/fanhuan/AAF
kSNP v3 Reference-free SNP identification and estimation of phylogenetic trees using SNPs (based on k-mer analysis) Software (C) [80, 81] https://sourceforge.net/projects/ksnp/files/
NGS-MC Phylogeny of species based on NGS reads using alignment-free sequence dissimilarity measures d2* and d2 S under different Markov chain models (using k-words) R package [79, 160] http://www-rcf.usc.edu/~fsun/Programs/NGS-MC/NGS-MC.html
Species identification/taxonomic profiling CLARK Taxonomic classification of metagenomic reads to known bacterial genomes using k-mer search and LCA assignment Software (C++) [84] http://clark.cs.ucr.edu/
FOCUS Reports organisms present in metagenomic samples and profiles their abundances (uses composition-based approach and non-negative least squares for prediction) Web service Software (Python) [161] http://edwards.sdsu.edu/FOCUS/
GSM Estimation of abundances of microbial genomes in metagenomic samples (k-mer based) Software (Go) [162] https://github.com/pdtrang/GSM
Mash Species identification using assembled or unassembled Illumina, PacBio, and ONT data (based on MinHash dimensionality-reduction technique) Software (C++) [163] https://github.com/marbl/mash
Kraken Taxonomic assignment in metagenome analysis by exact k-mer search; LCA assignment of short reads based on a comprehensive sequence database Software (C++) [83] https://ccb.jhu.edu/software/kraken/
LMAT Assignment of taxonomic labels to reads by k-mers searches in precomputed database Software (C++/Python) [82] https://sourceforge.net/projects/lmat/
stringMLST k-mer-based tool for MLST directly from the genome sequencing reads Software (Python) [86] http://jordan.biology.gatech.edu/page/software/stringMLST
Taxonomer k-mer-based ultrafast metagenomics tool for assigning taxonomy to sequencing reads from clinical and environmental samples Web service [164] http://taxonomer.iobio.io/
Other d2-tools Word-based (k-tuple) comparison (pairwise dissimilarity matrix using d2S measure) of metatranscriptomic samples from NGS reads Software (Python/R) [56, 165] https://code.google.com/p/d2-tools/
VirHostMatcher Prediction of hosts from metagenomic viral sequences based on ONF using various distance measures (e.g., d2) Software (C++) [153] https://github.com/jessieren/VirHostMatcher
   MetaFast Statistics calculation of metagenome sequences and the distances between them based on assembly using de Bruijn graphs and Bray–Curtis dissimilarity measure Software (Java) [166] https://github.com/ctlab/metafast
  1. The up-to-date list of currently available programs can be found at http://www.combio.pl/alfree/tools/. Accessed 23 August 2017
  2. LCA lowest common ancestor, NGS next-generation sequencing, SNP single-nucleotide polymorphism, SNV single-nucleotide variant