Skip to main content

Table 1 Identification of exons on the genome after vector screening using transcript, rodent, and protein databases.

From: A draft annotation and overview of the human genome

Category

Database

Total Records

Percent Placed

Unique Exons

Exon Length (bp)

Putative genes (Non-Splicing Singletons)

Protein Homology (Pfam Hit)

CpG Isplands

Known genes

UTR-DB

40,258

80%

19,195

6,925,762

10,007 (426)

5,701 (3,813)

3,866

HTDB

15,305

89%

48,477

11,893,081

4,816 (148)

2,938 (1,943)

1,960

 

Consensus Transcripts

HINT

87,000

77%

103,817

23,381,024

20,357 (959)

9,121 (6,453)

7,557

EG

62,064

80%

13,085

4,562,954

4,800 (154)

2,177 (1,679)

2,462

 

THC

84,837

81%

38,806

12,406,081

8,604 (322)

2,907 (2,026)

3,983

 

Transcripts

GenBank CDS

110,222

81%

41,917

5,303,064

2,634 (227)

1,858 (1,607)

1,178

DbEST Human

2,154,995

73%

273,881

32,288,385

20,073 (7,136)

5,377 (3,745)

11,807

 

Rodent Transcripts

MINT

92,531

30%

8,284

866,046

777

123 (56)

486

RINT

37,367

46%

5,600

592,788

458

65 (32)

255

 

EMBL Rodent

43,488

28%

5,819

724,630

202

68 (72)

135

 

Protein Homology

SWISS-PROT

86,593

38%

27,526

9,858,797

1,648

1,648 (1,244)

158

TrEMBL

351,834

13%

22,670

4,385,497

1,185

1,185 (654)

92

 

PIR

182,106

16%

4,106

1,355,644

321

321 (132)

20

 

Total

   

613,183

114,543,753

75,982 (9,372)

33,489 (23,008)

33,959

  1. The definition of a record varies according to the database, while 'exons' refer to high-scoring segment pairs in BlastN comparisons (E < 10-15 and sequence identity > 90%) to the genome. Unique Exons and all subsequent columns refer to placements that were possible after considering the preceding databases. Placement of rodent transcripts required evidence of splicing and sequence identity >80%. Protein homology required BlastX E < 10-15. Pfam hits required score > 20 using hmmpfam (http://hmmer.wustl.edu). CpG islands were identified using cpgreport (http://www.emboss.org) using standard criteria [24].