Alignment-free sequence comparison: benefits, applications, and tools

Zielezinski, Andrzej; Vinga, Susana; Almeida, Jonas; Karlowski, Wojciech M.

doi:10.1186/s13059-017-1319-7

Table 2 Alignment-free sequence comparison tools available for research purposes

From: Alignment-free sequence comparison: benefits, applications, and tools

Category	Name	Features	Implementation	Reference	URL
Pairwise and multiple sequence comparison	ALF	Calculation of pairwise similarity scores (using N2 measure) for sequences in fasta file	Software (C++)	[101]	https://github.com/seqan/seqan/tree/master/apps/alf
	Alfree	25 word-based measures, 8 IT-based measures, 3 graph-based measures, W-metric	Web service Software (Python)	This article	http://www.combio.pl/alfree
	decaf + py	13 word-based measures, Lempel–Ziv complexity-based measure, average common substring distance, W-metric	Software (Python)	[52, 53]	http://bioinformatics.org.au/tools/decaf+py/
	multiAlignFree	Multiple alignment-free sequence comparison using five word-based statistics	R package	[167]	http://www-rcf.usc.edu/~fsun/Programs/multiAlignFree/
	NASC	Non-aligned sequence comparison: four word-based measures and 2 IT-based measures	Matlab framework	[38]	http://web.ist.utl.pt/susanavinga/NASC/
Whole-genome phylogeny	ALFRED ALFRED-G	Phylogenetic tree reconstruction based on the average common substring approach	Software (C++)	[168, 169]	http://alurulab.cc.gatech.edu/phylo
	andi	Computation of evolutionary distances between closely related genomes by approximation of local alignments (k-mer based d_a measure); scalable to thousands of bacterial genomes	Software (C)	[170]	https://github.com/evolbioinf/andi/
	CAFE	Alignment-free analysis platform for studying the relationships among genomes and metagenomes (offers 28 word-based dissimilarity measures)	Software (C)	[171]	https://github.com/younglululu/CAFE
	CVTree3	Phylogeny reconstruction from whole genome sequences based on word composition	Web service	[172, 173]	http://tlife.fudan.edu.cn/cvtree3
	DLTree	Automated whole genome/proteome-based phylogenetic analysis based on alignment-free dynamical language method	Web Service	[174]	http://dltree.xtu.edu.cn
	FFP	Feature frequency profile-based measures for whole genome/proteome comparisons (from viral to mammalian scale)	Software (C/Perl)	[34, 55, 112]	https://sourceforge.net/projects/ffp-phylogeny/
	jD2Stat (JIWA)	Generation of the distance matrix using D ₂ statistics to extract k-mers from large-scale unaligned genome sequences	Software (Java)	[54]	http://bioinformatics.org.au/tools/jD2Stat/
	kr	Efficient word-based estimation of mutation distances from unaligned genomes	Software (C)	[175]	http://guanine.evolbio.mpg.de/cgi-bin/kr2/kr.cgi.pl
	FSWM/kmacs/Spaced	Three tools for alignment-free sequence comparison based on inexact word matches	Software (C++) Web service	[36, 176]	Software currently unavailable Software currently unavailable Software currently unavailable
	SlopeTree	Whole genome phylogeny that corrects for HGT	Software (C++)		http://prodata.swmed.edu/download/pub/slopetree_v1/
	Underlying Approach	Phylogeny of whole genomes using composition of subwords	Software (Java)	[139]	http://www.dei.unipd.it/~ciompin/main/underlying.html
Sequence similarity search tool	RAFTS3	Searches of similar protein sequences against a protein database (>300 times faster than BLAST)	Matlab	[177]	https://sourceforge.net/projects/rafts3/
Annotation of long non-coding RNA	FEELnc	Prediction of lncRNAs from RNA-seq samples based word frequencies and relaxed open reading frames	Software (Perl/R)	[178]	https://github.com/tderrien/FEELnc
Annotation of long non-coding RNA	lncScore	Identification of long non-coding RNA from assembled novel transcripts	Software (Python)	[152]	https://github.com/WGLab/lncScore
Horizontal gene transfer	alfy	Alignment-free local homology calculation for detecting horizontal gene transfer	Software (C)	[104, 109]	http://guanine.evolbio.mpg.de/alfy/
	rush	Detection of recombination between two unaligned DNA sequences	Software (C)	[105]	http://guanine.evolbio.mpg.de/rush/
	Smash	Identification and visualization of DNA rearrangements between pairs of sequences	Software (C)	[179]	http://bioinformatics.ua.pt/software/smash/
	TF-IDF	Detection of HGT regions and the transfer direction in nucleotide/protein sequences	Software (C++)	[110, 180]	https://github.com/congyingnan/TF-IDF
Regulatory elements	D2Z	Identification of functionally related homologous regulatory elements	Software (Perl)	[102]	http://veda.cs.uiuc.edu/d2z/
	MatrixREDUCE	Prediction of functional regulatory targets of TFs by predicting the total affinity of each promoter and orthologous promoters	Software (Python)	[181]	https://systemsbiology.columbia.edu/matrixreduce
	RRS	Detection of functionally similar group of enhancers and their regions	Software (Perl/C)	[182]	http://goo.gl/7gW578
Sequence clustering	d2_cluster	Word-based clustering EST and full-length cDNA sequences	Software (C)	[123]	https://github.com/shaze/wcdest/
	d2-vlmc	Word-based clustering of metatranscriptomic samples using variable length Markov chains	Software (Python)	[183]	https://d2vlmc.codeplex.com/
	mBKM	Clustering of DNA sequences using Shannon entropy and Euclidean distance	Software (Java)	[124]	https://github.com/Huiyang520/DMk-BKmeans
	kClust	Large-scale clustering of protein sequences (down to 20–30% sequence identity)	Software (C++)	[125]	https://github.com/soedinglab/kClust
Other	COMET	Rapid classification of HIV-1 nucleotide sequences into subtypes based on prediction by partial matching compression	Web service	[184]	https://comet.lih.lu/
	PPI	Identification of protein–protein interaction by coevolution analysis using discrete Fourier transform	Software (Python)	[185]	https://github.com/cyinbox/PPI
	VaxiJen	Antigen prediction based on uniform vectors of principal amino acid properties	Web service	[127]	http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html

The up-to-date list of currently available programs can be found at http://www.combio.pl/alfree/tools/. Accessed 23 August 2017
HGT horizontal gene transfer, IT information theory

Back to article page

ISSN: 1474-760X

Contact us

General enquiries: journalsubmissions@springernature.com

Genome Biology

Contact us