Skip to main content

Table 1 Statistics of the 25 selected tracks, arranged in the order of the UCSC genome browser

From: AceView: a comprehensive cDNA-supported gene and transcripts annotation

UCSC track Model with introns Model with introns and CDS Single exon model (some clipped) Unique introns in mRNA All introns in mRNA Input or method
HAVANA Gencode (Sanger, UK) known + putative 1,691 649 70 3,618 9,693 MEP,CA,H
EGASP model submissions
   AceView (NCBI, US) 1,630 1,460 24 3,530 9,597 ME,(H)
   UP Dogfish (Sanger, UK) 204 204 15 1,679 1,679 CA
   Exogean (ENS, France) 554 538 2 2,855 6,178 MEP,CA
   UP ExonHunter (U Waterloo, Canada) 807 807 220 3,237 3,237 MEP,CA
   Fgenesh (U London, UK) 462 458 97 2,610 3,241 P,CA
   UP GeneId (IMIM, Spain) 267 267 51 1,905 1,905 A
   UP GeneMark (Georgia IT, US) 551 551 81 2,185 2,185 A
   UP Jigsaw (TIGR, US) 259 259 67 2,168 2,168 MEP,CA
   PairagonAny (Wash U, US) 471 437 38 2,300 3,470 MEP?,CA
   UP SGP2 (IMIM, Spain) 552 552 159 2,645 2,645 P,CA
   P Twinscan-MARS (Wash U,US) 547 547 108 2,501 4,943 CA
   UP Augustus Any (U Göttingen, Germany) 312 316 87 2,291 2,291 MEP,CA
   UP GeneZilla (TIGR, US) 477 477 179 2,758 2,758 A
   UP Saga (UC Berkeley, US) 331 331 47 1,737 1,737 CA
UCSC gene tracks
   *Known Gene (UCSC) 501 477 53 2,264 4,427 MP
   *P CCDS 201 201 14 1,296 1,508 MP,H
   *RefSeq (NCBI, US) 342 325 41 2,082 2,922 M(E)P,H
   *MGC 323 310 19 1,400 2,101 M
   *Ensembl (EBI, UK) 427 418 58 2,429 3,548 MEP,CA
   *AceView (Aug 2005 NCBI) 1,792 1,627 902 3,812 9,792 ME, (H)
   *ECgene (Korea) 3,851 3,551 2,569 3,942 30,660 ME,C
   *U NscanEst (Wash U, US) 282 252 27 2,292 2,292 ME,CA
   *UP GenScan (MIT, US) 395 395 59 3,042 3,042 A
  1. The number of models, with or without introns (after clipping at region boundaries), the number of spliced coding models, and the number of unique and multiply used introns are given over the 31 ENCODE test regions. Coded information has been added in front of the track name: asterisks distinguish standard gene tracks, available genome-wide, from an ENCODE only track; a U track predicts a unique model per gene; P predicts protein coding regions only. According to their documentation, the programs use different input or methods: M, E, P stand for human mRNA, EST, protein sequences or alignments, respectively; C stands for for conservation, or use of cDNA or protein evidence from other species; A stands for ab initio prediction; H stands for Hand curation; and parenthesized letters stand for minimal use of the particular type. Notice the low proportion of Gencode mRNA models with an annotated CDS (in bold).