Skip to main content


Table 3 Comparison of prediction performances

From: Refinement and prediction of protein prenylation motifs

  Prosite PS00294 Beese, Casey and colleagues' rules PrePS FT Prosite PS00294 Beese, Casey and colleagues' rules PrePS GGT1
Sensitivity I 85%* 72% 100% 95%* 67% 100%
Sensitivity II NA NA 92.6/97.9% NA NA 98.6%
Probability of false positive prediction (POFP) for -CXXX motifs (GenBank sequences) 17.1%* 9.9% 6.3% 17.1%* 10.0% 1.2%
POFP -CXXX 'cytoplasmic' 18.2%* 8.9% 5.1% 18.2%* 8.6% 1.4%
POFP -CXXX 'nuclear' 13.9%* 10.5% 5.5% 13.9%* 9.6% 1.1%
POFP -CXXX 'membrane' 17.5%* 10.3% 3.8% 17.5%* 12.0% 0.8%
POFP --CXXX 'extracellular'$ 8.6%* 7.9% 3.3% 8.6%* 9.0% 0.2%
Overall probability of false positive prediction (GenBank sequences, assuming 1.7% with -CXXX) 0.29%* 0.16% 0.11% 0.29%* 0.17% 0.02%
  1. *Prosite pattern PS00294 does not distinguish between prenylation by FT and GGT1.
  2. Sensitivity rises to 97.9% when the exceptional motif CRPQ of hepatitis delta antigen is removed. For details see Materials and methods. Sensitivity I is the rate of finding known substrates from described learning set = self-consistency. Sensitivity II is the rate of finding known substrates after their exclusion (including homologs) from the learning set = cross-validation (see Materials and methods). Probabilities of false-positive predictions (POFP) complement the specificities to 100% (Specificity = 100 - POFP). The first listed POFP estimates the rates of false positives among query proteins that have a canonical -CXXX motif (which corresponds to 1.7% of all sequences). Below are estimations of POFPs for subsets of Swiss-Prot proteins that differ in their annotated subcellular localization (see Materials and methods). The final POFP is the estimate for false-positive predictions for all sequences (for example, when analyzing complete proteomes or large databases), independent of existence of a -CXXX motif. Formatting signifies: best (bold), intermediate (plain text), worst (italic) performance.