Skip to main content

Table 4 Known and novel predicted regulatory elements, obtained when applying FastCompare to D. melanogaster and D. pseudoobscura

From: Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach

Sequence Rank DATG WATG Orientation U/C TRANSFAC Comments
(a) Known regulatory elements
AACAGCTG 1 373 [0;1800] - 1.64 - Known AP-4/MyoD site
ATTTGCATA 3 882 [100;2000] - 3.20 Oct-1 Known (mammalian) Oct-1 site
CACGTGC 5 825.5 - - 1.02 Myc/Max, PHO4, USF Known Myc/Max site
ATTTATGC 6 866 - - 3.52 CdxA Known CdxA site
TGACGTCA 9 825 - - 2.36 CREB Known CREB site
TGATAAG 11 760.5 [0;1100] - 2.53 GATA Known GATA site, carbohydrate metabolism (p < 10-5)
TATCGATA 12 168 [0;1900] - 5.39 - Known DRE site
TTTATGGC 14 978.5 - - 2.82 Abd-B Known Abd-B site
TAATTGA 24 907 [0;1900] - 2.58 Ubx, Athb-1 Known Antp site
GAGAGAG 26 705.5 - ← (p < 10-4) 1.87 - Known GAGA site, morphogenesis (p < 10-23)
CAGGTGC 33 1020.5 - - 0.83 Sn Known Snail site
TGACTCA 46 911 [100;2000] - 1.89 AP-1, GCN4 Known AP-1 site
ATCAATCA 51 967 [0;1900] - 1.72 Pbx-1 Known Pbx-1 site
AAGGTCA 93 1015.5 [400;1900] - 1.16 HNF-4, ER Known HRE
AACATGTG 105 994 [100;2000] - 1.62 - Known Twist site
GTAAACA 147 813 [0;1200] - 2.54 Freac, SRY Known DAF-16 site in C. elegans
(b) Novel predicted regulatory elements
ACACACAC 2 922.5 - → (p < 10-12) 1.97 - Unknown site, embryonic development (p < 10-9)
CAAGGAG 13 1091 [200;2000] ← (p < 10-8) 0.84 - Unknown site
GCACACAC 29 886 - - 1.80 - Unknown site, histogenesis (p < 10-5)
CAAGTTCA 30 920 [0;1900] - 1.23 - Unknown site
TAATTAA 31 871 [500;2000] - 3.07 Ftz Unknown palindromic homeodomain-like site
CAACAACA 42 968.5 [200;2000] - 1.22 - Unknown site, regulation of transcription (p < 10-5)
TGGCGCC 48 951 - - 0.84 - Unknown palindromic site
CCTGTTGC 111 653 [0;1800] - 0.90 - Unknown site
GTGTGACC 112 296 [0;1900] → (p < 10-5) 2.22 - Unknown site
CAGGTAG 143 924.5 [0;1700] - 0.94 - Unknown site, cell fate commitment (p < 10-8)
CACACGCA 145 968.5 - - 1.49 - Unknown site, cellular morphogenesis (p < 10-5)
GTCAACAA 169 904 - - 1.48 - Unknown site, similar to DAF-16
AAATGGCG 205 592 - - 1.54 - Unknown site
TTGACCCA 239 860 [0;1700] - 1.60 - Unknown site
TGACACAC 273 860 - - 1.83 - Unknown site
TGTCAAC 281 999 [100;1900]   1.55 - Unknown site
  1. (a) For each known regulatory element, we show the best k-mer, its rank within the set of 469 highest scoring k-mers, the median distance to ATG (for occurrences upstream of genes within the conserved set), the optimal window, the orientation bias, the corrected ratio of upstream/coding bias, the total (up-regulated/down-regulated) number of microarray conditions in which the k-mer was found (see Method), TRANSFAC matches, and the best GO enrichment. (b) Novel predicted regulatory elements. k-mers shown here were selected from the list of 469 highest scoring k-mers based on their short median distance to ATG, short optimal window, significant orientation bias, strong over-representation ratio (U/C), presence in upstream regions of over/underexpressed genes in several microarray conditions, palindromicity or ressemblance to known sites in other species.
\