Skip to main content

Table 5 The most significant indicators of the degree of tissue specificity: start CpG island, TATA box, and YY1 site

From: Promoter features related to tissue specificity as measured by Shannon entropy

Features

Total fraction

H 0-3

H 3-4

H 4-5

CGI

TATA

YY1

 

Most specific

Semi-specific

Least specific

   

3,552

271

602

2679

   

1.00

0.08

0.17

0.75

CGI+

  

2,434

56

306

2072

   

0.69

0.02

0.13

0.85

    

0.30

0.74

1.13

CGI-

  

1,118

215

296

607

   

0.31

0.19

0.26

0.54

    

2.52

1.56

0.72

 

TATA+

 

604

136

175

293

   

0.17

0.23

0.29

0.49

    

2.95

1.71

0.64

 

TATA-

 

2,949

135

427

2,387

   

0.83

0.05

0.14

0.81

    

0.60

0.85

1.07

CGI+

TATA+

 

284

19

82

183

   

0.08

0.07

0.29

0.64

    

0.88

1.70

0.85

CGI-

TATA+

 

320

117

93

110

   

0.09

0.37

0.29

0.34

    

4.79

1.71

0.46

CGI+

TATA-

 

2,150

37

224

1,889

   

0.61

0.02

0.10

0.88

    

0.23

0.61

1.16

CGI-

TATA-

 

798

98

203

497

   

0.22

0.12

0.25

0.62

    

1.61

1.50

0.83

  

YY1+

293

1

16

276

   

0.08

0.00

0.05

0.94

    

0.04

0.32

1.25

CGI+

 

YY1+

261

1

10

250

   

0.07

0.00

0.04

0.96

    

0.05

0.23

1.27

CGI+

 

YY1-

2,173

55

296

1,822

   

0.61

0.03

0.14

0.84

    

0.33

0.80

1.11

CGI-

 

YY1-

1,086

215

290

581

   

0.31

0.20

0.27

0.53

    

2.59

1.58

0.71

CGI-

 

YY1+

32

0

6

26

   

0.01

0.00

0.19

0.81

    

0.00

1.11

1.08

  1. The three columns on the left indicate the combination of features considered; empty cells indicate that the feature is not considered. The 'Total fraction' column indicates the number of promoters with each feature combination (in bold) and the corresponding fraction of all genes considered. The three columns on the right give statistics for matching genes in three bands of tissue specificity. The top two lines give the number and corresponding fraction of all genes considered for each band. For each feature combination, the numbers indicate the number (top, bold), fraction (middle), and enrichment ratio (bottom) of matching genes. The enrichment ratio is the fraction of promoters of genes in the entropy band that contain a feature divided by the band's fraction among all genes considered. For example, specific genes are best recognized by a combination of TATA box (TATA+) and lack of a CpG island (CGI-), which enriches the fraction of such genes from 8% to 37% - a factor of 4.79. Nonspecific genes are most specifically recognized by CpG islands and YY1 sites, which returns a set that is 96% nonspecific genes, but only matches 7%/75% = 10% of the nonspecific genes.