Skip to main content

Table 3 List of the significant blocks detected in the pax6 dataset

From: A novel approach to identifying regulatory motifs in distantly related genomes

Block Consensus sequence and possible binding sites
pax6 1.1 (UCSC) CTTAATGATGAGAGATCTTTCCGCTCATTGCCCATTCAAATACAATTGTAGATCGAAGCCGGCCTT GTCAsGTTGAGAAAAAGTGAATTTCTAACATCCAGGACGTGCCTGTCTACT
  *Minimal fragment for expression in lens and cornea as described in [46]: 11-117 +
  Cap, M00253, NCANHNNN: 25-32 + (0.940); 79-86 - (0.964); 4-11 - (0.946); 1-8 - (0.903)
  CCAAT box, M00254, NNNRRCCAATSA: 27-38 + (0.901)
  *CdxA, M00100, 'MTTTATR': 1-7 + (0.921)*; 87-93 + (0.913)
  *CdxA, M00101, AWTWMTR: 1-7 + (0.934); 4-10 + (0.921); 38-44 + (0.905), 87-93 + (0.988)
  c-Ets-1(p54), M00032, NCMGGAWGYN: 98-107 + (0.906)
  c-Ets-1(p54), M00074, NNACMGGAWRTNN: 92-104 - (0.901)
  En-1, M00396, GTANTNN: 37-43 - (0.967)
  GATA-3, M00351, ANAGATMWWA: 11-20 + (0.920)
  HSF2, M00147, NGAANNWTCK: 13-22 - (0.933)
  p53, M00272, NGRCWTGYCY: 101-110 + (0.949)
pax6 1.2 (UCSC) CATTATTGTTGCCAGCACGAAGCATCACAATCAATCATAAGGAAGTCCAGTTGGCAGGTGTCAATCTTG
  CdxA, M00101, AWTWMTR: 1-7 - (0.995)
  Cap, M00253, NCANHNNN: 25-32 + (0.934); 31-38 + (0.903); 35-42 + (0.903); 47-54 + (0.908); 61-68 + (0.937)
  CDP CR3+HD, M00106, NATYGATSSS: 27-36 - (0.907)
  c-Ets-1(p54), M00074, NNACMGGAWRTNN: 36-48 + (0.902)
  *HOXA3, M00395, CNTANNNKN: 1-9 + (0.905)
  MyoD, M00184, NNCACCTGNY: 53-62 - (0.956)
  *Pbx-1, M00096, ANCAATCAW: 30-38 + (0.986); 2-10 - (0.923)
  Sox-5, M00042, NNAACAATNN: 3-12 - (0.932)
  SRY, M00148, AAACWAM: 33-39 + (0.910)
  USF, M00122, NNRNCACGTGNYNN: 51-64 + (0.913); 51-64 - (0.908)
pax6 1.3 (UCSC) GAAAAAGTGAATTTCTAACATCCAGGACGTGCCTGTCTACTTTCAGwGAATTGCATCCAATCACCCC
  Cap, M00253, NCANHNNN: 3-10 - 0.964
  CCAAT box, M00254, NNNRRCCAATSA: 52-63 + (0.949)
  CdxA, M00100, 'MTTTATR': 11-17 + (0.913)
  CdxA, M00101, AWTWMTR: 11-17 + (0.988)
  c-Ets-1(p54), M00032, NCMGGAWGYN: 22-31 + (0.906)
  c-Ets-1(p54), M00074, NNACMGGAWRTNN:16-28 - (0.901)
  En-1, M00396, GTANTNN: 58-64 - (0.948)
  GATA-1, M00075, SNNGATNNNN: 56-65 - (0.930)
  GATA-3, M00077, NNGATARNG: 56-64 - (0.917)
  NF-Y, M00185, TRRCCAATSRN: 54-64 + (0.910)
  p53, M00272, NGRCWTGYCY: 25-34 + (0.949)
  SRY, M00148, AAACWAM: 59-65 + (0.917)
pax6 1.4 (UCSC) GTCTATATTTAATCCAATTATAAGGGTCACGGAGTAAGTGC
  *Motif containing homeoboxes described in [46], TTTAATCCAATTATAA: 8-23 +
  Cap, M00253, NCANHNNN: 34-41 - (0.904)
  CdxA, M00100, 'MTTTATR': 16-22 + (0.907)
  CdxA, M00101, AWTWMTR: 16-22 + (0.995); 16-22 - (0.906); 6-12 - (0.931); 4-10 - (0.951)
  En-1, M00396, GTANTNN: 15-21 - (0.948)
  Nkx2-5, M00240, TYAAGTG: 34-40 + (0.927)
  RORalpha1, M00156, NWAWNNAGGTCAN: 18-30 + (0.919)
  TCF11, M00285, GTCATNNWNNNNN: 26-38 + (0.906)
pax6 1.5 (UCSC) GCATCCAATCACCCCCAGGG
  Cap, M00253, NCANHNNN: 9-16 + (0.965)
  En-1, M00396, GTANTNN: 6-12 - (0.948)
  GATA-3, M00077, NNGATARNG: 4-12 - (0.917)
  SRY, M00148, AAACWAM: 7-13 + (0.917)
pax6 1.6 (UCSC) CAsGTTGAGAAAAAGTGAATTTCTAACATCCAGGACGTGCCTGTCTACTTTCAGw GAATTGCATCCAATCACCCCCAGGGAATTCnGCTAATGTCTCC
  *Homeobox-binding site described in [46], GCTAATGTCTC: 87-97 +
  Cap, M00253, NCANHNNN: 69-76 + (0.965); 87-94 - (0.903); 11-18 - (0.964)
  CCAAT box, M00254, NNNRRCCAATSA: 60-71 + (0.949)
  CdxA, M00100, 'MTTTATR': 19-25 + (0.913)
  CdxA, M00101, AWTWMTR: 19-25 + (0.988)
  c-Ets-1(p54), M00032, NCMGGAWGYN: 30-39 + (0.906)
  c-Ets-1(p54), M00074, NNACMGGAWRTNN: 24-36 - (0.901)
  En-1, M00396, GTANTNN: 66-72 - (0.948)
  GATA-1, M00075, SNNGATNNNN: 64-73 - (0.930)
  GATA-3, M00077, NNGATARNG: 64-72 - (0.917)
  NF-Y, M00185, TRRCCAATSRN: 62-72 + (0.910)
  p53, M00272, NGRCWTGYCY: 33-42 + (0.949)
  SRY, M00148, AAACWAM: 67-73 + (0.917)
pax6 2.1 (UCSC) TGGGTCCATTTTCCAGAyGGTTTGTTACTCTTGCTGCmTGATTTrG
  Cap, M00253, NCANHNNN: 6-13 + (0.921)
  CdxA, M00101, AWTWMTR: 9-15 + (0.918)
  SRY, M00148, AAACWAM: 21-27 - (0.942)
pax6 2.2 (-) ATTTTGGTTGCTTTCAGGTwTAATTAACTTT
  Nkx2-5, M00241, CWTAATTG: 21-28 - (0.902)
pax6 2.3 (UCSC) ATTGTAATCATTTCAATTATCTTCA
  Cap, M00253, NCANHNNN: 8-15 + (0.927)
  En-1, M00396, GTANTNN: 14-20 - (0.948)
  Nkx2-5, M00241, CWTAATTG: 14-21 - (0.930)
pax6 2.4 (-) GGTTGCTTTCAGGTwTAATTAACTTTGAACAACAAATA
  Nkx2-5, M00241, CWTAATTG: 16-23 - (0.902)
pax6 3.1 (UCSC) TTGTAATTACTGCCCTTCATGTGGTCCGGTGCCTTGAACCATCTTTAATTAAAAGCATAATTAAGG
  AML-1a, M00271, TGTGGT: 20-25 + (1.000)
  Cap, M00253, NCANHNNN: 39-46 + (0.910); 55-62 + (0.909); 6-13 - (0.916)
  CdxA, M00100, MTTTATR: 56-62 - (0.934)
  CdxA, M00101, AWTWMTR: 6-12 + (0.988); 44-50 + (0.913); 47-53 + (0.900); 48-54 + (0.905); 59-65 + (0.903); 60-66 + (0.926); 56-62 - (0.998); 47-53 - (0.913); 44-50 - (0.901); 43-49 - (0.907); 2-8 - (0.949);
  En-1, M00396, GTANTNN: 3-9 + (0.912); 4-10 - (0.912)
  HSF2 , M00147, NGAANNWTCK: 35-44 + (0.908)
  Nkx2-5, M00241, CWTAATTG: 56-63 + (0.935), 58-65 - (0.954)
  USF, M00217, NCACGTGN: 17-24 - (0.921)
pax6 3.2 (UCSC) AAGGCTTGCAGCTGCCTCCAAATCAATAGAyGTCAAAGAAATATGAAAACArTC
  CdxA, M00101, AWTWMTR: 39-45 + (0.953); 36-42 - (0.925)
  SRY, M00148, AAACWAM: 35-41 + (0.961)
  Cap, M00253, NCANHNNN: 8-15 + (0.931); 39-46 - (0.940); 8-15 - (0.931)
  AP-4, M00175, VDCAGCTGNN: 7-16 - (0.902)
  MyoD, M00184, NNCACCTGNY: 7-16 + (0.957)
  SRY, M00160, NWWAACAAWANN: 19-30 + (0.928)
pax6 3.3 (UCSC) GCATAATTAAGGGAAGATCTAAAGAAAGACAATTACCAGATGGTCT
  Cap, M00253, NCANHNNN: 1-8 + (0.909)
  CdxA, M00100, MTTTATR: 2-8 - (0.934)
  CdxA, M00101, AWTWMTR: 5-11 + (0.903); 6-12 + (0.926); 32-38 + (0.939); 2-8 - (0.998)
  En-1, M00396, GTANTNN: 30-36 - (1.000)
  GATA-1, M00075, SNNGATNNNN: 36-45 + (0.936)
  GATA-2, M00076, NNNGATRNNN: 36-45 + (0.922)
  GATA-3, M00351, ANAGATMWWA: 13-22 + (0.949)
  HOXA3, M00395, CNTANNNKN: 29-37 - (0.939)
  Msx-1, M00394, CNGTAWNTG: 30-38 - (0.915)
  MyoD, M00184, NNCACCTGNY: 35-44 - (0.919)
  Nkx2-5, M00241, CWTAATTG: 2-9 + (0.935); 4-11 - (0.954)
  SRY, M00148, AAACWAM: 21-27 + (0.961); 25-31 + (0.927)
  USF, M00122, NNRNCACGTGNYNN: 33-46 + (0.907); 33-46 - (0.904)
  1. For each block, the consensus sequence is given followed by the possible binding sites situated in this block: motifs previously described in the literature [47] are marked with an asterisk. The motifs are summarized by their motif name (in bold), by their consensus sequence, if known, as described in the original article, by the sequence of the motif instance in our search, by the positions of the motif instance relative to the consensus sequence of the entire block and by the strand (indicated by a '+' or a '-') on which the motif occurred. Motif hits derived by Transfac are indicated by their matrix accession number, the consensus of this binding site and the instances of this motif in our search. These are further characterized by their positions relative to the consensus sequence of the entire block, by the strand on which the motif occurred and by the corresponding MotifLocator score (in parentheses). The blocks identified by the UCSC genome browser as conserved between mammals and Fugu are marked with 'UCSC', while the blocks detected by our two-step methodology but not present in the UCSC genome browser are indicated with a '-'.