Skip to content

Advertisement

  • Web report
  • Open Access

Gene recognition via spliced alignment

Genome Biology20001:reports233

https://doi.org/10.1186/gb-2000-1-1-reports233

  • Received: 10 January 2000
  • Published:

Abstract

The PROCRUSTES server provides a method for determining protein-coding sequences in genomic DNA.

Keywords

  • Related Sequence
  • Predict Protein Sequence
  • Gene Recognition
  • Intron Size
  • Partial cDNA Sequence

Content

The PROCRUSTES server provides a method for determining protein-coding sequences in genomic DNA. The main difference between PROCRUSTES and other gene-finding programs is that PROCRUSTES allows the user to supply a related protein sequence, which the program then uses to define the best multi-exon structure for the predicted protein. The resulting prediction is often much better than that produced by other programs, especially for genes with many introns.

Navigation

Reporter's comments

Timeliness

Last updated 2 January 1997.

Best feature

The ability to use a related sequence to determine the gene structure for an unknown gene is a powerful tool. Even distantly related proteins can be extremely useful in predicting exons in unknown sequence. The program outputs a combined graphic showing the predicted gene structures from all related proteins submitted, as well as a separate table of exons, sequence alignments, and predicted protein sequence for each related sequence, with a confidence score for each related sequence.

Worst feature

PROCRUSTES uses a very strict definition for splice sites, which can cause problems. The set of candidate exons is constructed by selection of all blocks between candidate acceptor and donor sites (that is, between an AG dinucleotide at an intron-exon boundary and a GU dinucleotide at an exon-intron boundary). As a result, if there are any deviations from this, the program will either fail to find the correct exons, or define exons of the wrong length. As slight deviations are fairly common, this is a major drawback.

Wish list

Allow the user to submit up to ten related sequences in a single FASTA-formatted file. Currently, each related sequence has to be cut and pasted into the web form separately. Allow the integration of organism-specific splice-site prediction programs (like NetGene2) to increase the accuracy of the program. Fully optimize the parameters for filtering exons for organisms other than mammals. Allow the integration of partial cDNA sequence information when this data is available.

Related websites

There are a number of gene prediction websites, including GENSCAN, Grail, GeneMark and Genie.

Table of links

References

Copyright

© BioMed Central Ltd 2000

Advertisement