Skip to main content

Table 1 Data imported by PRESTA

From: PRESTA: associating promoter sequences with information on gene expression

 

Human

Mouse

EPD

  

   Total entries

276

200

   Imported by PRESTA*

214

167

   tcg total†‡

139

109

   Present in GenBank/EMBL§

0

0

   Weak promoters

4

3

   Confirmed by one EST#

99

64

   Confirmed by two ESTs#

82

56

GenBank

  

   Total entries¥

5,870

3,289

   After pre-filter

570

307

   tag

484

313

   tcg total†‡

291

208

   tcg non-redundant

241

192

   Not found in EMBL**

128

96

EMBL

  

   Total entries¥

6,314

2,251

   After pre-filter

1051

274

   tag created

820

222

   tcg total†‡

571

150

   tcg non-redundant

425

145

   Not found in GenBank**

312

49

GenBank + EMBL

  

   tcg non-redundant

553

241

   Present in EPD§

0

0

   Possibly misannotated

30

16

   Confirmed by one EST#

326

153

   Confirmed by two ESTs#

281

124

  1. *EPD promoters are shown for comparison. Some EPD entries did not meet the PRESTA limit on downstream sequence length. Fraction of promoters successfully associated with ESTs. Both 'tag' and 'tcg' are internal PRESTA formats, 'tag' stores the promoter sequences, 'tcg' adds information about matching ESTs. §No overlap between the GenBank/EMBL non-redundant set and PRESTA-imported EPD entries was found using pairwise SEQALN alignment of immediately downstream transcribed sequences. This is not an error: EMBL sequences linked from EPD were correctly dissected, as some of them are homologous to dozens of 5' EST ends. Even more surprisingly, there is no apparent overlap between PRESTA and the full human subdivisions of EPD. An EPD entry directly stores a 49-base-pair stretch of the immediately upstream region. The full set of these stretches was downloaded by a simple web agent and compared to an analogous set of PRESTA sequences using SEQALN. There are no ESTs confirming the transcription start site and at least two 5' EST ends are longer then expected. # The 5' end of at least one (or two) matching ESTs maps to the -5 to +30 region relative to the transcription start site. In addition, the ratio of positively mapping to overshooting 5' ends is larger than 1:3. The current PRESTA version neglects the possibility that the library was amplified and that two or more ESTsactually originate from the same cDNA clone. ¥A sample query: (([genbank-Division:rod] & (([genbank-Organism:Mus*] & [genbank-Organism:musculus*]) | [genbank-Organism:Mus musculus*])) & ((((([genbank-FtKey:5\' utr] | [genbank-FtKey:precursor_rna]) | [genbank-FtKey:prim_transcript]) | [genbank-FtKey:promoter]) | [genbank-FtKey:tata_signal]) > parent)). **Not recovered by an equivalent query. This reflects different feature annotation rather then incomplete synchronization between the two major sequence databases.