Skip to main content
Figure 2 | Genome Biology

Figure 2

From: A simple metric of promoter architecture robustly predicts expression breadth of human genes suggesting that most transcription factors are positive regulators

Figure 2

Robustness to the variation in the size of the analysis window. This figure consists of three parts identified as ( a ) - ( c ). In part a, the number of transcription factor binding sites depending on the size of the promoter window was shown. As expected, the number of Tfbs was increasing progressively with the window size. However, the rate of the increase gradually decreased and transformed to linear. The point of the transformation was the presumed boundary of the proximal promoter. To localize this boundary more precisely, we fitted a local polynomial regression (loess) model and plotted its first derivative in part b. For all three subsets of ENCODE, there was a clear point of transformation where the rate of change (∆Tfbs) became constant (marked with ‘*’), at the distance from the TSS of approximately 3,000 base pairs (i.e., the promoter window of six thousand base pairs). Thus the outer boundary of promoters was estimated at three thousand base pairs from the transcription start site (TSS). In part c, we show that the correlation between the breadth of expression and the number of transcription factor biding sites was robust to variation in window size, although its strength was decreasing as the size of the analysis window was increasing. This observation suggested that Tfbs controlling the breadth of expression were enriched close to the transcription start site. Note that the analyses described here, used either a 2011 or 2012 ENCODE data-freeze. The 2011 meta data set included 2.7 million peaks for 148 transcription factors, derived from seventy-one cell-types with twenty-four additional experimental cell culture conditions [[31]]. Peak scores varied from zero through 1000. We used either all data or only high-quality peaks with the score above 500. The 2012 data-freeze, a broader dataset, consisted of 161 transcription factors and ninety-one human cell types with various treatment conditions [[32]].

Back to article page