| Human | Mouse |
---|
EPD | | |
Total entries | 276 | 200 |
Imported by PRESTA* | 214 | 167 |
tcg total†‡ | 139 | 109 |
Present in GenBank/EMBL§ | 0 | 0 |
Weak promoters¶ | 4 | 3 |
Confirmed by one EST# | 99 | 64 |
Confirmed by two ESTs# | 82 | 56 |
GenBank | | |
Total entries¥ | 5,870 | 3,289 |
After pre-filter | 570 | 307 |
tag‡ | 484 | 313 |
tcg total†‡ | 291 | 208 |
tcg non-redundant | 241 | 192 |
Not found in EMBL** | 128 | 96 |
EMBL | | |
Total entries¥ | 6,314 | 2,251 |
After pre-filter | 1051 | 274 |
tag created‡ | 820 | 222 |
tcg total†‡ | 571 | 150 |
tcg non-redundant‡ | 425 | 145 |
Not found in GenBank** | 312 | 49 |
GenBank + EMBL | | |
tcg non-redundant | 553 | 241 |
Present in EPD§ | 0 | 0 |
Possibly misannotated¶ | 30 | 16 |
Confirmed by one EST# | 326 | 153 |
Confirmed by two ESTs# | 281 | 124 |
- *EPD promoters are shown for comparison. Some EPD entries did not meet the PRESTA limit on downstream sequence length. †Fraction of promoters successfully associated with ESTs. ‡Both 'tag' and 'tcg' are internal PRESTA formats, 'tag' stores the promoter sequences, 'tcg' adds information about matching ESTs. §No overlap between the GenBank/EMBL non-redundant set and PRESTA-imported EPD entries was found using pairwise SEQALN alignment of immediately downstream transcribed sequences. This is not an error: EMBL sequences linked from EPD were correctly dissected, as some of them are homologous to dozens of 5' EST ends. Even more surprisingly, there is no apparent overlap between PRESTA and the full human subdivisions of EPD. An EPD entry directly stores a 49-base-pair stretch of the immediately upstream region. The full set of these stretches was downloaded by a simple web agent and compared to an analogous set of PRESTA sequences using SEQALN. ¶There are no ESTs confirming the transcription start site and at least two 5' EST ends are longer then expected. # The 5' end of at least one (or two) matching ESTs maps to the -5 to +30 region relative to the transcription start site. In addition, the ratio of positively mapping to overshooting 5' ends is larger than 1:3. The current PRESTA version neglects the possibility that the library was amplified and that two or more ESTsactually originate from the same cDNA clone. ¥A sample query: (([genbank-Division:rod] & (([genbank-Organism:Mus*] & [genbank-Organism:musculus*]) | [genbank-Organism:Mus musculus*])) & ((((([genbank-FtKey:5\' utr] | [genbank-FtKey:precursor_rna]) | [genbank-FtKey:prim_transcript]) | [genbank-FtKey:promoter]) | [genbank-FtKey:tata_signal]) > parent)). **Not recovered by an equivalent query. This reflects different feature annotation rather then incomplete synchronization between the two major sequence databases.