Skip to main content

Table 1 Summary of key features of MotifCluster and a selection of other programs that perform clustering of motifs or remote homology detection

From: MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs

Strategy Program Overview of program Publication
Clustering proteins by motifs they contain MotifCluster Takes aligned or unaligned protein and nucleotide sequences and a MEME file showing motifs; allows clustering of the sequences according to the motifs they contain, and visualization of the motifs on the aligned and unaligned sequences and three-dimensional structures This article
Clustering of transcription factor binding sites (in DNA) MCAST Takes list of transcription factor binding sites as input: uses hidden Markov models to find cis-regulatory modules in DNA [21]
  Cluster-Buster Takes list of transcription factor binding sites as input: uses Forward algorithm and expected uniform distribution to find motif co-occurrence in DNA [22]
  ClusterDraw Takes list of transcription factor binding sites as input: uses r-scan algorithm and sweep over parameter values to visualize significant clusters as peaks on the DNA sequence [23]
  COMET Calculates significance of collection of position-specific score matrices that appear in order: can apply to DNA or protein, in principle [24]
  PEAKS Calculates significance of collection of transcription factor binding sites that appear at specified distance from transcription start site or other feature in the DNA [25]
  CompMoby Aligns all pairs of motifs that appear significant in different promoters, then groups these into clusters using the CAST algorithm. DNA-specific [26]
  CREME Identifies groups of DNA motifs that co-occur significantly within a defined distance using both order-dependent and order-independent models [27]
  PHYLOCLUS Uses Bayesian method to find clusters of evolutionarily conserved DNA motifs that appear in different promoters. [28]
  INCLUSive Clusters genes based on microarray analysis: feeds promoters to Gibbs sampler to find DNA motifs overrepresented in each cluster [29]
Identifying kernels for SVMs* SVM kernels Introduces kernels based on k-word occurrences and best BLAST hit for SVM clustering: does not focus on conserved motifs [30]
  WCM (word correlation matrices) Introduces k-word kernel for SVM clustering based on correlations in appearance of pairs of k-words: does not focus on conserved motifs. [31]
  ODH (oligomer distance histograms) Introduces new kernel for SVM clustering based on histograms of distances between all words in protein: does not focus on conserved motifs [32]
Iterative BLAST Shotgun BLAST-based approach for identifying remote homologs by iterative searches: not motif-based [3]
  DivergentSet Among other features, can perform BLAST and PSI-BLAST versions of Shotgun and choose representative sequences of each group: not motif-based [20]
  Cascade PSI-BLAST Performs iterative steps of PSI-BLAST, otherwise like Shotgun: not motif-based. [33]
  ProClust Performs graph-based connection of proteins based on pairwise sequence similarity: not motif based [34]
k-word clustering CD-Hit Clusters proteins based on shared segments of overall sequence, not by motifs already known to be significant [35]
Profile-profile alignment COMPASS Performs profile-profile alignments for remote homology detection: assesses statistical significance matches in the profiles overall, rather than specifically using shared motifs [1]
Clustering of motifs STAMP Aligns motifs with one another so that relationships among motifs can be detected; performs many other tasks for promoter characterization, but specific to promoters [36]
  TAMO Performs many functions for cis-regulatory analysis: is able to cluster DNA motifs with one another [37]
  SOMBRERO Aligns and clusters DNA motifs with one another to improve transcription factor binding site searches [38]
Identification of functions in labeled structures FunClust Takes set of three-dimensional structures with annotated functions; identifies three-dimensional motif fragments that are common to the structures with each function. [39]
  1. *SVMs are support vector machines, a common machine learning approach to pattern classification. A kernel is a function that calculates the inner product of all pairs of input vectors in an abstract space, which is an important step in the process and affects the clustering.