Skip to main content

Table 3 List of the significant blocks detected in the pax6 dataset

From: A novel approach to identifying regulatory motifs in distantly related genomes

Block

Consensus sequence and possible binding sites

pax6 1.1 (UCSC)

CTTAATGATGAGAGATCTTTCCGCTCATTGCCCATTCAAATACAATTGTAGATCGAAGCCGGCCTT GTCAsGTTGAGAAAAAGTGAATTTCTAACATCCAGGACGTGCCTGTCTACT

 

*Minimal fragment for expression in lens and cornea as described in [46]: 11-117 +

 

Cap, M00253, NCANHNNN: 25-32 + (0.940); 79-86 - (0.964); 4-11 - (0.946); 1-8 - (0.903)

 

CCAAT box, M00254, NNNRRCCAATSA: 27-38 + (0.901)

 

*CdxA, M00100, 'MTTTATR': 1-7 + (0.921)*; 87-93 + (0.913)

 

*CdxA, M00101, AWTWMTR: 1-7 + (0.934); 4-10 + (0.921); 38-44 + (0.905), 87-93 + (0.988)

 

c-Ets-1(p54), M00032, NCMGGAWGYN: 98-107 + (0.906)

 

c-Ets-1(p54), M00074, NNACMGGAWRTNN: 92-104 - (0.901)

 

En-1, M00396, GTANTNN: 37-43 - (0.967)

 

GATA-3, M00351, ANAGATMWWA: 11-20 + (0.920)

 

HSF2, M00147, NGAANNWTCK: 13-22 - (0.933)

 

p53, M00272, NGRCWTGYCY: 101-110 + (0.949)

pax6 1.2 (UCSC)

CATTATTGTTGCCAGCACGAAGCATCACAATCAATCATAAGGAAGTCCAGTTGGCAGGTGTCAATCTTG

 

CdxA, M00101, AWTWMTR: 1-7 - (0.995)

 

Cap, M00253, NCANHNNN: 25-32 + (0.934); 31-38 + (0.903); 35-42 + (0.903); 47-54 + (0.908); 61-68 + (0.937)

 

CDP CR3+HD, M00106, NATYGATSSS: 27-36 - (0.907)

 

c-Ets-1(p54), M00074, NNACMGGAWRTNN: 36-48 + (0.902)

 

*HOXA3, M00395, CNTANNNKN: 1-9 + (0.905)

 

MyoD, M00184, NNCACCTGNY: 53-62 - (0.956)

 

*Pbx-1, M00096, ANCAATCAW: 30-38 + (0.986); 2-10 - (0.923)

 

Sox-5, M00042, NNAACAATNN: 3-12 - (0.932)

 

SRY, M00148, AAACWAM: 33-39 + (0.910)

 

USF, M00122, NNRNCACGTGNYNN: 51-64 + (0.913); 51-64 - (0.908)

pax6 1.3 (UCSC)

GAAAAAGTGAATTTCTAACATCCAGGACGTGCCTGTCTACTTTCAGwGAATTGCATCCAATCACCCC

 

Cap, M00253, NCANHNNN: 3-10 - 0.964

 

CCAAT box, M00254, NNNRRCCAATSA: 52-63 + (0.949)

 

CdxA, M00100, 'MTTTATR': 11-17 + (0.913)

 

CdxA, M00101, AWTWMTR: 11-17 + (0.988)

 

c-Ets-1(p54), M00032, NCMGGAWGYN: 22-31 + (0.906)

 

c-Ets-1(p54), M00074, NNACMGGAWRTNN:16-28 - (0.901)

 

En-1, M00396, GTANTNN: 58-64 - (0.948)

 

GATA-1, M00075, SNNGATNNNN: 56-65 - (0.930)

 

GATA-3, M00077, NNGATARNG: 56-64 - (0.917)

 

NF-Y, M00185, TRRCCAATSRN: 54-64 + (0.910)

 

p53, M00272, NGRCWTGYCY: 25-34 + (0.949)

 

SRY, M00148, AAACWAM: 59-65 + (0.917)

pax6 1.4 (UCSC)

GTCTATATTTAATCCAATTATAAGGGTCACGGAGTAAGTGC

 

*Motif containing homeoboxes described in [46], TTTAATCCAATTATAA: 8-23 +

 

Cap, M00253, NCANHNNN: 34-41 - (0.904)

 

CdxA, M00100, 'MTTTATR': 16-22 + (0.907)

 

CdxA, M00101, AWTWMTR: 16-22 + (0.995); 16-22 - (0.906); 6-12 - (0.931); 4-10 - (0.951)

 

En-1, M00396, GTANTNN: 15-21 - (0.948)

 

Nkx2-5, M00240, TYAAGTG: 34-40 + (0.927)

 

RORalpha1, M00156, NWAWNNAGGTCAN: 18-30 + (0.919)

 

TCF11, M00285, GTCATNNWNNNNN: 26-38 + (0.906)

pax6 1.5 (UCSC)

GCATCCAATCACCCCCAGGG

 

Cap, M00253, NCANHNNN: 9-16 + (0.965)

 

En-1, M00396, GTANTNN: 6-12 - (0.948)

 

GATA-3, M00077, NNGATARNG: 4-12 - (0.917)

 

SRY, M00148, AAACWAM: 7-13 + (0.917)

pax6 1.6 (UCSC)

CAsGTTGAGAAAAAGTGAATTTCTAACATCCAGGACGTGCCTGTCTACTTTCAGw GAATTGCATCCAATCACCCCCAGGGAATTCnGCTAATGTCTCC

 

*Homeobox-binding site described in [46], GCTAATGTCTC: 87-97 +

 

Cap, M00253, NCANHNNN: 69-76 + (0.965); 87-94 - (0.903); 11-18 - (0.964)

 

CCAAT box, M00254, NNNRRCCAATSA: 60-71 + (0.949)

 

CdxA, M00100, 'MTTTATR': 19-25 + (0.913)

 

CdxA, M00101, AWTWMTR: 19-25 + (0.988)

 

c-Ets-1(p54), M00032, NCMGGAWGYN: 30-39 + (0.906)

 

c-Ets-1(p54), M00074, NNACMGGAWRTNN: 24-36 - (0.901)

 

En-1, M00396, GTANTNN: 66-72 - (0.948)

 

GATA-1, M00075, SNNGATNNNN: 64-73 - (0.930)

 

GATA-3, M00077, NNGATARNG: 64-72 - (0.917)

 

NF-Y, M00185, TRRCCAATSRN: 62-72 + (0.910)

 

p53, M00272, NGRCWTGYCY: 33-42 + (0.949)

 

SRY, M00148, AAACWAM: 67-73 + (0.917)

pax6 2.1 (UCSC)

TGGGTCCATTTTCCAGAyGGTTTGTTACTCTTGCTGCmTGATTTrG

 

Cap, M00253, NCANHNNN: 6-13 + (0.921)

 

CdxA, M00101, AWTWMTR: 9-15 + (0.918)

 

SRY, M00148, AAACWAM: 21-27 - (0.942)

pax6 2.2 (-)

ATTTTGGTTGCTTTCAGGTwTAATTAACTTT

 

Nkx2-5, M00241, CWTAATTG: 21-28 - (0.902)

pax6 2.3 (UCSC)

ATTGTAATCATTTCAATTATCTTCA

 

Cap, M00253, NCANHNNN: 8-15 + (0.927)

 

En-1, M00396, GTANTNN: 14-20 - (0.948)

 

Nkx2-5, M00241, CWTAATTG: 14-21 - (0.930)

pax6 2.4 (-)

GGTTGCTTTCAGGTwTAATTAACTTTGAACAACAAATA

 

Nkx2-5, M00241, CWTAATTG: 16-23 - (0.902)

pax6 3.1 (UCSC)

TTGTAATTACTGCCCTTCATGTGGTCCGGTGCCTTGAACCATCTTTAATTAAAAGCATAATTAAGG

 

AML-1a, M00271, TGTGGT: 20-25 + (1.000)

 

Cap, M00253, NCANHNNN: 39-46 + (0.910); 55-62 + (0.909); 6-13 - (0.916)

 

CdxA, M00100, MTTTATR: 56-62 - (0.934)

 

CdxA, M00101, AWTWMTR: 6-12 + (0.988); 44-50 + (0.913); 47-53 + (0.900); 48-54 + (0.905); 59-65 + (0.903); 60-66 + (0.926); 56-62 - (0.998); 47-53 - (0.913); 44-50 - (0.901); 43-49 - (0.907); 2-8 - (0.949);

 

En-1, M00396, GTANTNN: 3-9 + (0.912); 4-10 - (0.912)

 

HSF2 , M00147, NGAANNWTCK: 35-44 + (0.908)

 

Nkx2-5, M00241, CWTAATTG: 56-63 + (0.935), 58-65 - (0.954)

 

USF, M00217, NCACGTGN: 17-24 - (0.921)

pax6 3.2 (UCSC)

AAGGCTTGCAGCTGCCTCCAAATCAATAGAyGTCAAAGAAATATGAAAACArTC

 

CdxA, M00101, AWTWMTR: 39-45 + (0.953); 36-42 - (0.925)

 

SRY, M00148, AAACWAM: 35-41 + (0.961)

 

Cap, M00253, NCANHNNN: 8-15 + (0.931); 39-46 - (0.940); 8-15 - (0.931)

 

AP-4, M00175, VDCAGCTGNN: 7-16 - (0.902)

 

MyoD, M00184, NNCACCTGNY: 7-16 + (0.957)

 

SRY, M00160, NWWAACAAWANN: 19-30 + (0.928)

pax6 3.3 (UCSC)

GCATAATTAAGGGAAGATCTAAAGAAAGACAATTACCAGATGGTCT

 

Cap, M00253, NCANHNNN: 1-8 + (0.909)

 

CdxA, M00100, MTTTATR: 2-8 - (0.934)

 

CdxA, M00101, AWTWMTR: 5-11 + (0.903); 6-12 + (0.926); 32-38 + (0.939); 2-8 - (0.998)

 

En-1, M00396, GTANTNN: 30-36 - (1.000)

 

GATA-1, M00075, SNNGATNNNN: 36-45 + (0.936)

 

GATA-2, M00076, NNNGATRNNN: 36-45 + (0.922)

 

GATA-3, M00351, ANAGATMWWA: 13-22 + (0.949)

 

HOXA3, M00395, CNTANNNKN: 29-37 - (0.939)

 

Msx-1, M00394, CNGTAWNTG: 30-38 - (0.915)

 

MyoD, M00184, NNCACCTGNY: 35-44 - (0.919)

 

Nkx2-5, M00241, CWTAATTG: 2-9 + (0.935); 4-11 - (0.954)

 

SRY, M00148, AAACWAM: 21-27 + (0.961); 25-31 + (0.927)

 

USF, M00122, NNRNCACGTGNYNN: 33-46 + (0.907); 33-46 - (0.904)

  1. For each block, the consensus sequence is given followed by the possible binding sites situated in this block: motifs previously described in the literature [47] are marked with an asterisk. The motifs are summarized by their motif name (in bold), by their consensus sequence, if known, as described in the original article, by the sequence of the motif instance in our search, by the positions of the motif instance relative to the consensus sequence of the entire block and by the strand (indicated by a '+' or a '-') on which the motif occurred. Motif hits derived by Transfac are indicated by their matrix accession number, the consensus of this binding site and the instances of this motif in our search. These are further characterized by their positions relative to the consensus sequence of the entire block, by the strand on which the motif occurred and by the corresponding MotifLocator score (in parentheses). The blocks identified by the UCSC genome browser as conserved between mammals and Fugu are marked with 'UCSC', while the blocks detected by our two-step methodology but not present in the UCSC genome browser are indicated with a '-'.