Skip to main content

Table 1 Summary of key features of MotifCluster and a selection of other programs that perform clustering of motifs or remote homology detection

From: MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs

Strategy

Program

Overview of program

Publication

Clustering proteins by motifs they contain

MotifCluster

Takes aligned or unaligned protein and nucleotide sequences and a MEME file showing motifs; allows clustering of the sequences according to the motifs they contain, and visualization of the motifs on the aligned and unaligned sequences and three-dimensional structures

This article

Clustering of transcription factor binding sites (in DNA)

MCAST

Takes list of transcription factor binding sites as input: uses hidden Markov models to find cis-regulatory modules in DNA

[21]

 

Cluster-Buster

Takes list of transcription factor binding sites as input: uses Forward algorithm and expected uniform distribution to find motif co-occurrence in DNA

[22]

 

ClusterDraw

Takes list of transcription factor binding sites as input: uses r-scan algorithm and sweep over parameter values to visualize significant clusters as peaks on the DNA sequence

[23]

 

COMET

Calculates significance of collection of position-specific score matrices that appear in order: can apply to DNA or protein, in principle

[24]

 

PEAKS

Calculates significance of collection of transcription factor binding sites that appear at specified distance from transcription start site or other feature in the DNA

[25]

 

CompMoby

Aligns all pairs of motifs that appear significant in different promoters, then groups these into clusters using the CAST algorithm. DNA-specific

[26]

 

CREME

Identifies groups of DNA motifs that co-occur significantly within a defined distance using both order-dependent and order-independent models

[27]

 

PHYLOCLUS

Uses Bayesian method to find clusters of evolutionarily conserved DNA motifs that appear in different promoters.

[28]

 

INCLUSive

Clusters genes based on microarray analysis: feeds promoters to Gibbs sampler to find DNA motifs overrepresented in each cluster

[29]

Identifying kernels for SVMs*

SVM kernels

Introduces kernels based on k-word occurrences and best BLAST hit for SVM clustering: does not focus on conserved motifs

[30]

 

WCM (word correlation matrices)

Introduces k-word kernel for SVM clustering based on correlations in appearance of pairs of k-words: does not focus on conserved motifs.

[31]

 

ODH (oligomer distance histograms)

Introduces new kernel for SVM clustering based on histograms of distances between all words in protein: does not focus on conserved motifs

[32]

Iterative BLAST

Shotgun

BLAST-based approach for identifying remote homologs by iterative searches: not motif-based

[3]

 

DivergentSet

Among other features, can perform BLAST and PSI-BLAST versions of Shotgun and choose representative sequences of each group: not motif-based

[20]

 

Cascade PSI-BLAST

Performs iterative steps of PSI-BLAST, otherwise like Shotgun: not motif-based.

[33]

 

ProClust

Performs graph-based connection of proteins based on pairwise sequence similarity: not motif based

[34]

k-word clustering

CD-Hit

Clusters proteins based on shared segments of overall sequence, not by motifs already known to be significant

[35]

Profile-profile alignment

COMPASS

Performs profile-profile alignments for remote homology detection: assesses statistical significance matches in the profiles overall, rather than specifically using shared motifs

[1]

Clustering of motifs

STAMP

Aligns motifs with one another so that relationships among motifs can be detected; performs many other tasks for promoter characterization, but specific to promoters

[36]

 

TAMO

Performs many functions for cis-regulatory analysis: is able to cluster DNA motifs with one another

[37]

 

SOMBRERO

Aligns and clusters DNA motifs with one another to improve transcription factor binding site searches

[38]

Identification of functions in labeled structures

FunClust

Takes set of three-dimensional structures with annotated functions; identifies three-dimensional motif fragments that are common to the structures with each function.

[39]

  1. *SVMs are support vector machines, a common machine learning approach to pattern classification. A kernel is a function that calculates the inner product of all pairs of input vectors in an abstract space, which is an important step in the process and affects the clustering.