Skip to main content

Table 2 Alignment-free sequence comparison tools available for research purposes

From: Alignment-free sequence comparison: benefits, applications, and tools

Category Name Features Implementation Reference URL
Pairwise and multiple sequence comparison ALF Calculation of pairwise similarity scores (using N2 measure) for sequences in fasta file Software (C++) [101]
Alfree 25 word-based measures, 8 IT-based measures, 3 graph-based measures, W-metric Web service Software (Python) This article
decaf + py 13 word-based measures, Lempel–Ziv complexity-based measure, average common substring distance, W-metric Software (Python) [52, 53]
multiAlignFree Multiple alignment-free sequence comparison using five word-based statistics R package [167]
NASC Non-aligned sequence comparison: four word-based measures and 2 IT-based measures Matlab framework [38]
Whole-genome phylogeny ALFRED ALFRED-G Phylogenetic tree reconstruction based on the average common substring approach Software (C++) [168, 169]
andi Computation of evolutionary distances between closely related genomes by approximation of local alignments (k-mer based da measure); scalable to thousands of bacterial genomes Software (C) [170]
CAFE Alignment-free analysis platform for studying the relationships among genomes and metagenomes (offers 28 word-based dissimilarity measures) Software (C) [171]
CVTree3 Phylogeny reconstruction from whole genome sequences based on word composition Web service [172, 173]
DLTree Automated whole genome/proteome-based phylogenetic analysis based on alignment-free dynamical language method Web Service [174]
FFP Feature frequency profile-based measures for whole genome/proteome comparisons (from viral to mammalian scale) Software (C/Perl) [34, 55, 112]
jD2Stat (JIWA) Generation of the distance matrix using D 2 statistics to extract k-mers from large-scale unaligned genome sequences Software (Java) [54]
kr Efficient word-based estimation of mutation distances from unaligned genomes Software (C) [175]
FSWM/kmacs/Spaced Three tools for alignment-free sequence comparison based on inexact word matches Software (C++) Web service [36, 176] Software currently unavailable
Software currently unavailable
Software currently unavailable
SlopeTree Whole genome phylogeny that corrects for HGT Software (C++)
Underlying Approach Phylogeny of whole genomes using composition of subwords Software (Java) [139]
Sequence similarity search tool RAFTS3 Searches of similar protein sequences against a protein database (>300 times faster than BLAST) Matlab [177]
Annotation of long non-coding RNA FEELnc Prediction of lncRNAs from RNA-seq samples based word frequencies and relaxed open reading frames Software (Perl/R) [178]
lncScore Identification of long non-coding RNA from assembled novel transcripts Software (Python) [152]
Horizontal gene transfer alfy Alignment-free local homology calculation for detecting horizontal gene transfer Software (C) [104, 109]
rush Detection of recombination between two unaligned DNA sequences Software (C) [105]
Smash Identification and visualization of DNA rearrangements between pairs of sequences Software (C) [179]
TF-IDF Detection of HGT regions and the transfer direction in nucleotide/protein sequences Software (C++) [110, 180]
Regulatory elements D2Z Identification of functionally related homologous regulatory elements Software (Perl) [102]
MatrixREDUCE Prediction of functional regulatory targets of TFs by predicting the total affinity of each promoter and orthologous promoters Software (Python) [181]
RRS Detection of functionally similar group of enhancers and their regions Software (Perl/C) [182]
Sequence clustering d2_cluster Word-based clustering EST and full-length cDNA sequences Software (C) [123]
d2-vlmc Word-based clustering of metatranscriptomic samples using variable length Markov chains Software (Python) [183]
mBKM Clustering of DNA sequences using Shannon entropy and Euclidean distance Software (Java) [124]
kClust Large-scale clustering of protein sequences (down to 20–30% sequence identity) Software (C++) [125]
Other COMET Rapid classification of HIV-1 nucleotide sequences into subtypes based on prediction by partial matching compression Web service [184]
PPI Identification of protein–protein interaction by coevolution analysis using discrete Fourier transform Software (Python) [185]
VaxiJen Antigen prediction based on uniform vectors of principal amino acid properties Web service [127]
  1. The up-to-date list of currently available programs can be found at Accessed 23 August 2017
  2. HGT horizontal gene transfer, IT information theory