- Web report
- Open Access
Gene recognition via spliced alignment
- Todd Richmond
© BioMed Central Ltd 2000
- Received: 10 January 2000
- Published: 17 March 2000
The PROCRUSTES server provides a method for determining protein-coding sequences in genomic DNA.
- Related Sequence
- Predict Protein Sequence
- Gene Recognition
- Intron Size
- Partial cDNA Sequence
The PROCRUSTES server provides a method for determining protein-coding sequences in genomic DNA. The main difference between PROCRUSTES and other gene-finding programs is that PROCRUSTES allows the user to supply a related protein sequence, which the program then uses to define the best multi-exon structure for the predicted protein. The resulting prediction is often much better than that produced by other programs, especially for genes with many introns.
Last updated 2 January 1997.
The ability to use a related sequence to determine the gene structure for an unknown gene is a powerful tool. Even distantly related proteins can be extremely useful in predicting exons in unknown sequence. The program outputs a combined graphic showing the predicted gene structures from all related proteins submitted, as well as a separate table of exons, sequence alignments, and predicted protein sequence for each related sequence, with a confidence score for each related sequence.
PROCRUSTES uses a very strict definition for splice sites, which can cause problems. The set of candidate exons is constructed by selection of all blocks between candidate acceptor and donor sites (that is, between an AG dinucleotide at an intron-exon boundary and a GU dinucleotide at an exon-intron boundary). As a result, if there are any deviations from this, the program will either fail to find the correct exons, or define exons of the wrong length. As slight deviations are fairly common, this is a major drawback.
Allow the user to submit up to ten related sequences in a single FASTA-formatted file. Currently, each related sequence has to be cut and pasted into the web form separately. Allow the integration of organism-specific splice-site prediction programs (like NetGene2) to increase the accuracy of the program. Fully optimize the parameters for filtering exons for organisms other than mammals. Allow the integration of partial cDNA sequence information when this data is available.