Skip to main content

Table 1 Consensus sequences for the most significant groups of word pairs

From: Phylogenetically and spatially conserved word pairs associated with gene-expression changes in yeasts

 

Hexamer list for word 1

Compiled sequence 1

TF for consensus 1

Hexamer list for word 2

Compiled sequence 2

TF for consensus 2

Number of word pairs

1

GAGATG

GCGATG

  AGATGA

  CGATGA

    GATGAG

      ATGAGA

      ATGAGC

        TGAGAT

        TGAGCT

          GAGATG

            AGATGA

            AGCTCA

GMGATGAGMTSA

Unknown (PAC motif [38])

TGAAAA

  GAAAAA

    AAAAAT

      AAAATT

        AAATTT

TGAAAATTT

Unknown (RRPE motif [38])

75

2

AAGTGA

  AATGAA

  AGTGAA

    ATGAAA

    CTGAAA

      TGAAAA

ANTGAAAAA

Unknown (RRPE motif [38])

GAAAAA

GAAAAT

  AAAATT

    AAATTT

GAAAAWTT

Unknown (RRPE motif [38])

40

3

GTTCCC

  CTCCCC

    ACCCCT

    TCCCCT

GYWCCCCT

(motif 38 [26])

CCCTTT

  CCTTTT

  CCTTAT

CCCTTWT

(motif 38 [26])

5

4*

GGCGGC

  GCGGCT

GGCGGCT

Ume6p

GTGGCA

  GGCAAA

GTGGCAAA

Rpn4p

2

5

CCCTTT

  CCTTTT

CCCTTTT

Msn2/4p-like

GGA GAA

  GGGAAA

GGRGAAA

Hsf1p

2

6

CGGCGG

CGGCGG

Ume6p

TACCCC

ACCCCA

  CCCCAA

TACCCCAA

Mig1p

3

7*

CCGCGG

CCGCGG

Pdr1/3p

CGGAAA

CGGAAA

Unknown

1

8

AAACGC

  GACGCG

  AACGCG

    ACGCGT

    ACGCGA

    TCGCGT

      CGCGTC

ARWCGCGW

Mbp1p

CGCGAA

  ACGAAA

  GCGAAA

    CGAAAC

    CGAAAA

CRCGAAAM

Swi4/6p

9

9

TCACGT

  CACGTG

    ACGTGC

TCACGTGC

Cbf1p

ACTGTG

  CTGTGG

    TGTGGC

      GTGGCT

ACTGTGGCT

Met31/32p

6

10

TATT TT

  TT TTGT

    TTTGTT

    ATTGTT

TWTTGTT

Fkh1/2p

TGTTTA

  GTTTAC

TGTTTAC

Fkh1/2p

4

11

TTTGTT

  TTGTTT

TTTGTTT

Fkh1/2p

TTTTTC

TTTTTT

TTTTTY

TnC

4

12*

TCGTTT

  CGTTTA

TCGTTTA

Ecm22p | Upc2p

CCGATA

  CGATAA

CCGATAA

Hap1p

4

13

TCGTTT

  CGTTTA

TCGTTTA

Ecm22p | Upc2p

TATTGT

  ATTGTT

TATTGTT

Rox1p

2

14

CGTTTC

  GTTTCT

CGTTTCT

Ecm22p | Upc2p

TTCTTT

  TCTTTT

    CTTTTT

TTCTTTTT

TnC

5

  1. The output P× Cmatrix of word pairs (P) that were significantly associated (p < 0.001) with at least five or more environmental conditions (C) was ordered using hierarchical clustering. Numbers correspond to groups of overlapping word pairs indicated in Figure 4. Asterisks denote sequence pairs whose involvement in multifactorial regulation has not been previously reported. Compiled sequences were assembled from groups of word pairs that were found in adjacent rows in the ordering of K-S p-values. As individual words must have passed all three statistical tests to be included in the output matrix, these consensus sequences may not reflect the actual biological specificities of conserved transcription factor binding sites (refer to [26, 36] for a more complete list). Residues are shown in bold if they are invariant in at least two hexamers. Numbers denote the groups that are indicated in Figure 4. Multiple transcription factors that may bind the same sequence motif are separated by |. IUPAC codes used: K (G or T); M (A or C); R (A or G); S (C or G); W (A or T).