Skip to content

Advertisement

  • Web report
  • Open Access

Finding motifs in protein sequences

Genome Biology20001:reports2052

https://doi.org/10.1186/gb-2000-1-3-reports2052

  • Received: 2 August 2000
  • Published:

Abstract

EMOTIF search is one of a set of four integrated bioinformatics resources at Stanford University devoted to constructing and searching for motifs in protein sequences.

Keywords

  • Multiple Sequence Alignment
  • Matching Protein
  • Search Site
  • Single Mismatch
  • Complete Motif

Content

EMOTIF search is one of a set of four integrated bioinformatics resources at Stanford University devoted to constructing and searching for motifs in protein sequences. EMOTIF search, the subject of this report, finds motifs in user-specified proteins; EMOTIF maker will construct motifs from multiple sequence alignments of protein; EMOTIF scan allows you to search for proteins that contain a motif that you specify; and 3MOTIF allows you to find three-dimensional motifs using protein structures.

Navigation

Reporter's comments

Timeliness

There is no indication of when the site was last updated, or what version of each of the sequence databases is being searched.

Best feature

The site is very simple to use, and the integration of the various resources is very useful. One can make a motif, search for proteins with the motif, and then determine if they, in turn, share any other motifs.

Worst feature

Unfortunately, the results are of dubious use. Using one of my favorite proteins - a putative glycosyltransferase from Arabidopsis - one of the true conserved motifs was buried in a mess of false positives (though the page claims that no false positives are expected at that stringency). Worse, when I went to check on the description of the 'true hit' in the BLOCKS database using the supplied link, I received an error saying that no such BLOCK exists. When I used the link to initiate an EMOTIF scan, I was presented with a substantial list of matching proteins, from both SwissPROT and GenBank. But closer inspection revealed that a number of proteins that should have matched the same motif were not present. In fact, of the 22 known Arabidopsis proteins with this particular glycosyltransferase motif, not a single one was in the list - a very glaring omission. In the interests of fairness, I decided to test another protein: a multifunctional protein involved in beta-oxidation of fatty acids. There are several very clear domains in this protein, which match the PROSITE consensus sequences for these motifs. One domain was identified (in fact, 18 times), but the other domains were not. An EMOTIF scan with several of the motif matches again revealed an absence of any of the Arabidopsis sequences that contain these motifs. Although it is not stated anywhere on the site, it seems clear that only a subset of the protein database (or a very old version) is being searched.

When I tried to allow a single mismatch in the EMOTIF scan, thinking that perhaps a single amino-acid mismatch might cause some proteins to be omitted, I discovered that this feature is obviously broken. Instead of a short list of matching proteins with the protein motif highlighted, the search instead started spewing an incredible number of full-length protein sequences, without any highlighting or notation.

It should be noted that the EMOTIF site has undergone some revisions in the month since this report was written. The navigation has not changed and there still appear to be problems with the results - now it is more likely that no results will be returned than the user will be given spurious ones.

Wish list

The site needs better documentation to let people know how the programs work and to state clearly the limitations of the tools. I searched through most of the site and the only help pages I could find were for the construction of EMOTIFs from multiple sequence alignments.

Related websites

There is no indication of when the site was last updated, or what version of each of the sequence databases is being searched.

Two better sites for motif searches are the BLOCKS servers and the PROSITE database of protein families and domains.

Table of links

References

Copyright

Advertisement