MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs

Hamady, Micah; Widmann, Jeremy; Copley, Shelley D; Knight, Rob

doi:10.1186/gb-2008-9-8-r128

Table 1 Summary of key features of MotifCluster and a selection of other programs that perform clustering of motifs or remote homology detection

From: MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs

Strategy	Program	Overview of program	Publication
Clustering proteins by motifs they contain	MotifCluster	Takes aligned or unaligned protein and nucleotide sequences and a MEME file showing motifs; allows clustering of the sequences according to the motifs they contain, and visualization of the motifs on the aligned and unaligned sequences and three-dimensional structures	This article
Clustering of transcription factor binding sites (in DNA)	MCAST	Takes list of transcription factor binding sites as input: uses hidden Markov models to find cis-regulatory modules in DNA	[21]
	Cluster-Buster	Takes list of transcription factor binding sites as input: uses Forward algorithm and expected uniform distribution to find motif co-occurrence in DNA	[22]
	ClusterDraw	Takes list of transcription factor binding sites as input: uses r-scan algorithm and sweep over parameter values to visualize significant clusters as peaks on the DNA sequence	[23]
	COMET	Calculates significance of collection of position-specific score matrices that appear in order: can apply to DNA or protein, in principle	[24]
	PEAKS	Calculates significance of collection of transcription factor binding sites that appear at specified distance from transcription start site or other feature in the DNA	[25]
	CompMoby	Aligns all pairs of motifs that appear significant in different promoters, then groups these into clusters using the CAST algorithm. DNA-specific	[26]
	CREME	Identifies groups of DNA motifs that co-occur significantly within a defined distance using both order-dependent and order-independent models	[27]
	PHYLOCLUS	Uses Bayesian method to find clusters of evolutionarily conserved DNA motifs that appear in different promoters.	[28]
	INCLUSive	Clusters genes based on microarray analysis: feeds promoters to Gibbs sampler to find DNA motifs overrepresented in each cluster	[29]
Identifying kernels for SVMs*	SVM kernels	Introduces kernels based on k-word occurrences and best BLAST hit for SVM clustering: does not focus on conserved motifs	[30]
	WCM (word correlation matrices)	Introduces k-word kernel for SVM clustering based on correlations in appearance of pairs of k-words: does not focus on conserved motifs.	[31]
	ODH (oligomer distance histograms)	Introduces new kernel for SVM clustering based on histograms of distances between all words in protein: does not focus on conserved motifs	[32]
Iterative BLAST	Shotgun	BLAST-based approach for identifying remote homologs by iterative searches: not motif-based	[3]
	DivergentSet	Among other features, can perform BLAST and PSI-BLAST versions of Shotgun and choose representative sequences of each group: not motif-based	[20]
	Cascade PSI-BLAST	Performs iterative steps of PSI-BLAST, otherwise like Shotgun: not motif-based.	[33]
	ProClust	Performs graph-based connection of proteins based on pairwise sequence similarity: not motif based	[34]
k-word clustering	CD-Hit	Clusters proteins based on shared segments of overall sequence, not by motifs already known to be significant	[35]
Profile-profile alignment	COMPASS	Performs profile-profile alignments for remote homology detection: assesses statistical significance matches in the profiles overall, rather than specifically using shared motifs	[1]
Clustering of motifs	STAMP	Aligns motifs with one another so that relationships among motifs can be detected; performs many other tasks for promoter characterization, but specific to promoters	[36]
	TAMO	Performs many functions for cis-regulatory analysis: is able to cluster DNA motifs with one another	[37]
	SOMBRERO	Aligns and clusters DNA motifs with one another to improve transcription factor binding site searches	[38]
Identification of functions in labeled structures	FunClust	Takes set of three-dimensional structures with annotated functions; identifies three-dimensional motif fragments that are common to the structures with each function.	[39]

*SVMs are support vector machines, a common machine learning approach to pattern classification. A kernel is a function that calculates the inner product of all pairs of input vectors in an abstract space, which is an important step in the process and affects the clustering.

Back to article page

ISSN: 1474-760X

Contact us

Submission enquiries: editorial@genomebiology.com
General enquiries: info@biomedcentral.com

Genome Biology

Contact us