Skip to main content

Table 1 Identification of exons on the genome after vector screening using transcript, rodent, and protein databases.

From: A draft annotation and overview of the human genome

Category Database Total Records Percent Placed Unique Exons Exon Length (bp) Putative genes (Non-Splicing Singletons) Protein Homology (Pfam Hit) CpG Isplands
Known genes UTR-DB 40,258 80% 19,195 6,925,762 10,007 (426) 5,701 (3,813) 3,866
HTDB 15,305 89% 48,477 11,893,081 4,816 (148) 2,938 (1,943) 1,960  
Consensus Transcripts HINT 87,000 77% 103,817 23,381,024 20,357 (959) 9,121 (6,453) 7,557
EG 62,064 80% 13,085 4,562,954 4,800 (154) 2,177 (1,679) 2,462  
THC 84,837 81% 38,806 12,406,081 8,604 (322) 2,907 (2,026) 3,983  
Transcripts GenBank CDS 110,222 81% 41,917 5,303,064 2,634 (227) 1,858 (1,607) 1,178
DbEST Human 2,154,995 73% 273,881 32,288,385 20,073 (7,136) 5,377 (3,745) 11,807  
Rodent Transcripts MINT 92,531 30% 8,284 866,046 777 123 (56) 486
RINT 37,367 46% 5,600 592,788 458 65 (32) 255  
EMBL Rodent 43,488 28% 5,819 724,630 202 68 (72) 135  
Protein Homology SWISS-PROT 86,593 38% 27,526 9,858,797 1,648 1,648 (1,244) 158
TrEMBL 351,834 13% 22,670 4,385,497 1,185 1,185 (654) 92  
PIR 182,106 16% 4,106 1,355,644 321 321 (132) 20  
Total     613,183 114,543,753 75,982 (9,372) 33,489 (23,008) 33,959
  1. The definition of a record varies according to the database, while 'exons' refer to high-scoring segment pairs in BlastN comparisons (E < 10-15 and sequence identity > 90%) to the genome. Unique Exons and all subsequent columns refer to placements that were possible after considering the preceding databases. Placement of rodent transcripts required evidence of splicing and sequence identity >80%. Protein homology required BlastX E < 10-15. Pfam hits required score > 20 using hmmpfam (http://hmmer.wustl.edu). CpG islands were identified using cpgreport (http://www.emboss.org) using standard criteria [24].