Skip to main content

Table 2 Alignment-free sequence comparison tools available for research purposes

From: Alignment-free sequence comparison: benefits, applications, and tools

Category Name Features Implementation Reference URL
Pairwise and multiple sequence comparison ALF Calculation of pairwise similarity scores (using N2 measure) for sequences in fasta file Software (C++) [101] https://github.com/seqan/seqan/tree/master/apps/alf
Alfree 25 word-based measures, 8 IT-based measures, 3 graph-based measures, W-metric Web service Software (Python) This article http://www.combio.pl/alfree
decaf + py 13 word-based measures, Lempel–Ziv complexity-based measure, average common substring distance, W-metric Software (Python) [52, 53] http://bioinformatics.org.au/tools/decaf+py/
multiAlignFree Multiple alignment-free sequence comparison using five word-based statistics R package [167] http://www-rcf.usc.edu/~fsun/Programs/multiAlignFree/
NASC Non-aligned sequence comparison: four word-based measures and 2 IT-based measures Matlab framework [38] http://web.ist.utl.pt/susanavinga/NASC/
Whole-genome phylogeny ALFRED ALFRED-G Phylogenetic tree reconstruction based on the average common substring approach Software (C++) [168, 169] http://alurulab.cc.gatech.edu/phylo
andi Computation of evolutionary distances between closely related genomes by approximation of local alignments (k-mer based da measure); scalable to thousands of bacterial genomes Software (C) [170] https://github.com/evolbioinf/andi/
CAFE Alignment-free analysis platform for studying the relationships among genomes and metagenomes (offers 28 word-based dissimilarity measures) Software (C) [171] https://github.com/younglululu/CAFE
CVTree3 Phylogeny reconstruction from whole genome sequences based on word composition Web service [172, 173] http://tlife.fudan.edu.cn/cvtree3
DLTree Automated whole genome/proteome-based phylogenetic analysis based on alignment-free dynamical language method Web Service [174] http://dltree.xtu.edu.cn
FFP Feature frequency profile-based measures for whole genome/proteome comparisons (from viral to mammalian scale) Software (C/Perl) [34, 55, 112] https://sourceforge.net/projects/ffp-phylogeny/
jD2Stat (JIWA) Generation of the distance matrix using D 2 statistics to extract k-mers from large-scale unaligned genome sequences Software (Java) [54] http://bioinformatics.org.au/tools/jD2Stat/
kr Efficient word-based estimation of mutation distances from unaligned genomes Software (C) [175] http://guanine.evolbio.mpg.de/cgi-bin/kr2/kr.cgi.pl
FSWM/kmacs/Spaced Three tools for alignment-free sequence comparison based on inexact word matches Software (C++) Web service [36, 176] Software currently unavailable
Software currently unavailable
Software currently unavailable
SlopeTree Whole genome phylogeny that corrects for HGT Software (C++)   http://prodata.swmed.edu/download/pub/slopetree_v1/
Underlying Approach Phylogeny of whole genomes using composition of subwords Software (Java) [139] http://www.dei.unipd.it/~ciompin/main/underlying.html
Sequence similarity search tool RAFTS3 Searches of similar protein sequences against a protein database (>300 times faster than BLAST) Matlab [177] https://sourceforge.net/projects/rafts3/
Annotation of long non-coding RNA FEELnc Prediction of lncRNAs from RNA-seq samples based word frequencies and relaxed open reading frames Software (Perl/R) [178] https://github.com/tderrien/FEELnc
lncScore Identification of long non-coding RNA from assembled novel transcripts Software (Python) [152] https://github.com/WGLab/lncScore
Horizontal gene transfer alfy Alignment-free local homology calculation for detecting horizontal gene transfer Software (C) [104, 109] http://guanine.evolbio.mpg.de/alfy/
rush Detection of recombination between two unaligned DNA sequences Software (C) [105] http://guanine.evolbio.mpg.de/rush/
Smash Identification and visualization of DNA rearrangements between pairs of sequences Software (C) [179] http://bioinformatics.ua.pt/software/smash/
TF-IDF Detection of HGT regions and the transfer direction in nucleotide/protein sequences Software (C++) [110, 180] https://github.com/congyingnan/TF-IDF
Regulatory elements D2Z Identification of functionally related homologous regulatory elements Software (Perl) [102] http://veda.cs.uiuc.edu/d2z/
MatrixREDUCE Prediction of functional regulatory targets of TFs by predicting the total affinity of each promoter and orthologous promoters Software (Python) [181] https://systemsbiology.columbia.edu/matrixreduce
RRS Detection of functionally similar group of enhancers and their regions Software (Perl/C) [182] http://goo.gl/7gW578
Sequence clustering d2_cluster Word-based clustering EST and full-length cDNA sequences Software (C) [123] https://github.com/shaze/wcdest/
d2-vlmc Word-based clustering of metatranscriptomic samples using variable length Markov chains Software (Python) [183] https://d2vlmc.codeplex.com/
mBKM Clustering of DNA sequences using Shannon entropy and Euclidean distance Software (Java) [124] https://github.com/Huiyang520/DMk-BKmeans
kClust Large-scale clustering of protein sequences (down to 20–30% sequence identity) Software (C++) [125] https://github.com/soedinglab/kClust
Other COMET Rapid classification of HIV-1 nucleotide sequences into subtypes based on prediction by partial matching compression Web service [184] https://comet.lih.lu/
PPI Identification of protein–protein interaction by coevolution analysis using discrete Fourier transform Software (Python) [185] https://github.com/cyinbox/PPI
VaxiJen Antigen prediction based on uniform vectors of principal amino acid properties Web service [127] http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html
  1. The up-to-date list of currently available programs can be found at http://www.combio.pl/alfree/tools/. Accessed 23 August 2017
  2. HGT horizontal gene transfer, IT information theory