Strategy | Program | Overview of program | Publication |
---|---|---|---|
Clustering proteins by motifs they contain | MotifCluster | Takes aligned or unaligned protein and nucleotide sequences and a MEME file showing motifs; allows clustering of the sequences according to the motifs they contain, and visualization of the motifs on the aligned and unaligned sequences and three-dimensional structures | This article |
Clustering of transcription factor binding sites (in DNA) | MCAST | Takes list of transcription factor binding sites as input: uses hidden Markov models to find cis-regulatory modules in DNA | [21] |
 | Cluster-Buster | Takes list of transcription factor binding sites as input: uses Forward algorithm and expected uniform distribution to find motif co-occurrence in DNA | [22] |
 | ClusterDraw | Takes list of transcription factor binding sites as input: uses r-scan algorithm and sweep over parameter values to visualize significant clusters as peaks on the DNA sequence | [23] |
 | COMET | Calculates significance of collection of position-specific score matrices that appear in order: can apply to DNA or protein, in principle | [24] |
 | PEAKS | Calculates significance of collection of transcription factor binding sites that appear at specified distance from transcription start site or other feature in the DNA | [25] |
 | CompMoby | Aligns all pairs of motifs that appear significant in different promoters, then groups these into clusters using the CAST algorithm. DNA-specific | [26] |
 | CREME | Identifies groups of DNA motifs that co-occur significantly within a defined distance using both order-dependent and order-independent models | [27] |
 | PHYLOCLUS | Uses Bayesian method to find clusters of evolutionarily conserved DNA motifs that appear in different promoters. | [28] |
 | INCLUSive | Clusters genes based on microarray analysis: feeds promoters to Gibbs sampler to find DNA motifs overrepresented in each cluster | [29] |
Identifying kernels for SVMs* | SVM kernels | Introduces kernels based on k-word occurrences and best BLAST hit for SVM clustering: does not focus on conserved motifs | [30] |
 | WCM (word correlation matrices) | Introduces k-word kernel for SVM clustering based on correlations in appearance of pairs of k-words: does not focus on conserved motifs. | [31] |
 | ODH (oligomer distance histograms) | Introduces new kernel for SVM clustering based on histograms of distances between all words in protein: does not focus on conserved motifs | [32] |
Iterative BLAST | Shotgun | BLAST-based approach for identifying remote homologs by iterative searches: not motif-based | [3] |
 | DivergentSet | Among other features, can perform BLAST and PSI-BLAST versions of Shotgun and choose representative sequences of each group: not motif-based | [20] |
 | Cascade PSI-BLAST | Performs iterative steps of PSI-BLAST, otherwise like Shotgun: not motif-based. | [33] |
 | ProClust | Performs graph-based connection of proteins based on pairwise sequence similarity: not motif based | [34] |
k-word clustering | CD-Hit | Clusters proteins based on shared segments of overall sequence, not by motifs already known to be significant | [35] |
Profile-profile alignment | COMPASS | Performs profile-profile alignments for remote homology detection: assesses statistical significance matches in the profiles overall, rather than specifically using shared motifs | [1] |
Clustering of motifs | STAMP | Aligns motifs with one another so that relationships among motifs can be detected; performs many other tasks for promoter characterization, but specific to promoters | [36] |
 | TAMO | Performs many functions for cis-regulatory analysis: is able to cluster DNA motifs with one another | [37] |
 | SOMBRERO | Aligns and clusters DNA motifs with one another to improve transcription factor binding site searches | [38] |
Identification of functions in labeled structures | FunClust | Takes set of three-dimensional structures with annotated functions; identifies three-dimensional motif fragments that are common to the structures with each function. | [39] |