From: Alignment-free sequence comparison: benefits, applications, and tools
Category | Name | Features | Implementation | Reference | URL |
---|---|---|---|---|---|
Pairwise and multiple sequence comparison | ALF | Calculation of pairwise similarity scores (using N2 measure) for sequences in fasta file | Software (C++) | [101] | |
Alfree | 25 word-based measures, 8 IT-based measures, 3 graph-based measures, W-metric | Web service Software (Python) | This article | ||
decaf + py | 13 word-based measures, Lempel–Ziv complexity-based measure, average common substring distance, W-metric | Software (Python) | |||
multiAlignFree | Multiple alignment-free sequence comparison using five word-based statistics | R package | [167] | ||
NASC | Non-aligned sequence comparison: four word-based measures and 2 IT-based measures | Matlab framework | [38] | ||
Whole-genome phylogeny | ALFRED ALFRED-G | Phylogenetic tree reconstruction based on the average common substring approach | Software (C++) | ||
andi | Computation of evolutionary distances between closely related genomes by approximation of local alignments (k-mer based da measure); scalable to thousands of bacterial genomes | Software (C) | [170] | ||
CAFE | Alignment-free analysis platform for studying the relationships among genomes and metagenomes (offers 28 word-based dissimilarity measures) | Software (C) | [171] | ||
CVTree3 | Phylogeny reconstruction from whole genome sequences based on word composition | Web service | |||
DLTree | Automated whole genome/proteome-based phylogenetic analysis based on alignment-free dynamical language method | Web Service | [174] | ||
FFP | Feature frequency profile-based measures for whole genome/proteome comparisons (from viral to mammalian scale) | Software (C/Perl) | |||
jD2Stat (JIWA) | Generation of the distance matrix using D 2 statistics to extract k-mers from large-scale unaligned genome sequences | Software (Java) | [54] | ||
kr | Efficient word-based estimation of mutation distances from unaligned genomes | Software (C) | [175] | ||
FSWM/kmacs/Spaced | Three tools for alignment-free sequence comparison based on inexact word matches | Software (C++) Web service | Software currently unavailable Software currently unavailable Software currently unavailable | ||
SlopeTree | Whole genome phylogeny that corrects for HGT | Software (C++) | Â | ||
Underlying Approach | Phylogeny of whole genomes using composition of subwords | Software (Java) | [139] | ||
Sequence similarity search tool | RAFTS3 | Searches of similar protein sequences against a protein database (>300 times faster than BLAST) | Matlab | [177] | |
Annotation of long non-coding RNA | FEELnc | Prediction of lncRNAs from RNA-seq samples based word frequencies and relaxed open reading frames | Software (Perl/R) | [178] | |
lncScore | Identification of long non-coding RNA from assembled novel transcripts | Software (Python) | [152] | ||
Horizontal gene transfer | alfy | Alignment-free local homology calculation for detecting horizontal gene transfer | Software (C) | ||
rush | Detection of recombination between two unaligned DNA sequences | Software (C) | [105] | ||
Smash | Identification and visualization of DNA rearrangements between pairs of sequences | Software (C) | [179] | ||
TF-IDF | Detection of HGT regions and the transfer direction in nucleotide/protein sequences | Software (C++) | |||
Regulatory elements | D2Z | Identification of functionally related homologous regulatory elements | Software (Perl) | [102] | |
MatrixREDUCE | Prediction of functional regulatory targets of TFs by predicting the total affinity of each promoter and orthologous promoters | Software (Python) | [181] | ||
RRS | Detection of functionally similar group of enhancers and their regions | Software (Perl/C) | [182] | ||
Sequence clustering | d2_cluster | Word-based clustering EST and full-length cDNA sequences | Software (C) | [123] | |
d2-vlmc | Word-based clustering of metatranscriptomic samples using variable length Markov chains | Software (Python) | [183] | ||
mBKM | Clustering of DNA sequences using Shannon entropy and Euclidean distance | Software (Java) | [124] | ||
kClust | Large-scale clustering of protein sequences (down to 20–30% sequence identity) | Software (C++) | [125] | ||
Other | COMET | Rapid classification of HIV-1 nucleotide sequences into subtypes based on prediction by partial matching compression | Web service | [184] | |
PPI | Identification of protein–protein interaction by coevolution analysis using discrete Fourier transform | Software (Python) | [185] | ||
VaxiJen | Antigen prediction based on uniform vectors of principal amino acid properties | Web service | [127] |