Skip to main content

Table 1 Consensus sequences for the most significant groups of word pairs

From: Phylogenetically and spatially conserved word pairs associated with gene-expression changes in yeasts

  Hexamer list for word 1 Compiled sequence 1 TF for consensus 1 Hexamer list for word 2 Compiled sequence 2 TF for consensus 2 Number of word pairs
1 GAGATG
GCGATG
  AGATGA
  CGATGA
    GATGAG
      ATGAGA
      ATGAGC
        TGAGAT
        TGAGCT
          GAGATG
            AGATGA
            AGCTCA
GMGATGAGMTSA Unknown (PAC motif [38]) TGAAAA
  GAAAAA
    AAAAAT
      AAAATT
        AAATTT
TGAAAATTT Unknown (RRPE motif [38]) 75
2 AAGTGA
  AATGAA
  AGTGAA
    ATGAAA
    CTGAAA
      TGAAAA
ANTGAAAAA Unknown (RRPE motif [38]) GAAAAA
GAAAAT
  AAAATT
    AAATTT
GAAAAWTT Unknown (RRPE motif [38]) 40
3 GTTCCC
  CTCCCC
    ACCCCT
    TCCCCT
GYWCCCCT (motif 38 [26]) CCCTTT
  CCTTTT
  CCTTAT
CCCTTWT (motif 38 [26]) 5
4* GGCGGC
  GCGGCT
GGCGGCT Ume6p GTGGCA
  GGCAAA
GTGGCAAA Rpn4p 2
5 CCCTTT
  CCTTTT
CCCTTTT Msn2/4p-like GGA GAA
  GGGAAA
GGRGAAA Hsf1p 2
6 CGGCGG CGGCGG Ume6p TACCCC
ACCCCA
  CCCCAA
TACCCCAA Mig1p 3
7* CCGCGG CCGCGG Pdr1/3p CGGAAA CGGAAA Unknown 1
8 AAACGC
  GACGCG
  AACGCG
    ACGCGT
    ACGCGA
    TCGCGT
      CGCGTC
ARWCGCGW Mbp1p CGCGAA
  ACGAAA
  GCGAAA
    CGAAAC
    CGAAAA
CRCGAAAM Swi4/6p 9
9 TCACGT
  CACGTG
    ACGTGC
TCACGTGC Cbf1p ACTGTG
  CTGTGG
    TGTGGC
      GTGGCT
ACTGTGGCT Met31/32p 6
10 TATT TT
  TT TTGT
    TTTGTT
    ATTGTT
TWTTGTT Fkh1/2p TGTTTA
  GTTTAC
TGTTTAC Fkh1/2p 4
11 TTTGTT
  TTGTTT
TTTGTTT Fkh1/2p TTTTTC
TTTTTT
TTTTTY TnC 4
12* TCGTTT
  CGTTTA
TCGTTTA Ecm22p | Upc2p CCGATA
  CGATAA
CCGATAA Hap1p 4
13 TCGTTT
  CGTTTA
TCGTTTA Ecm22p | Upc2p TATTGT
  ATTGTT
TATTGTT Rox1p 2
14 CGTTTC
  GTTTCT
CGTTTCT Ecm22p | Upc2p TTCTTT
  TCTTTT
    CTTTTT
TTCTTTTT TnC 5
  1. The output P× Cmatrix of word pairs (P) that were significantly associated (p < 0.001) with at least five or more environmental conditions (C) was ordered using hierarchical clustering. Numbers correspond to groups of overlapping word pairs indicated in Figure 4. Asterisks denote sequence pairs whose involvement in multifactorial regulation has not been previously reported. Compiled sequences were assembled from groups of word pairs that were found in adjacent rows in the ordering of K-S p-values. As individual words must have passed all three statistical tests to be included in the output matrix, these consensus sequences may not reflect the actual biological specificities of conserved transcription factor binding sites (refer to [26, 36] for a more complete list). Residues are shown in bold if they are invariant in at least two hexamers. Numbers denote the groups that are indicated in Figure 4. Multiple transcription factors that may bind the same sequence motif are separated by |. IUPAC codes used: K (G or T); M (A or C); R (A or G); S (C or G); W (A or T).