Genomic mapping of RNA polymerase II reveals sites of co-transcriptional regulation in human cells
© Brodsky et al.; licensee BioMed Central Ltd. 2005
Received: 4 January 2005
Accepted: 17 June 2005
Published: 15 July 2005
Transcription by RNA polymerase II is regulated at many steps including initiation, promoter release, elongation and termination. Accumulation of RNA polymerase II at particular locations across genes can be indicative of sites of regulation. RNA polymerase II is thought to accumulate at the promoter and at sites of co-transcriptional alternative splicing where the rate of RNA synthesis slows.
To further understand transcriptional regulation at a global level, we determined the distribution of RNA polymerase II within regions of the human genome designated by the ENCODE project. Hypophosphorylated RNA polymerase II localizes almost exclusively to 5' ends of genes. On the other hand, localization of total RNA polymerase II reveals a variety of distinct landscapes across many genes with 74% of the observed enriched locations at exons. RNA polymerase II accumulates at many annotated constitutively spliced exons, but is biased for alternatively spliced exons. Finally, RNA polymerase II is also observed at locations not in gene regions.
Localizing RNA polymerase II across many millions of base pairs in the human genome identifies novel sites of transcription and provides insights into the regulation of transcription elongation. These data indicate that RNA polymerase II accumulates most often at exons during transcription. Thus, a major factor of transcription elongation control in mammalian cells is the coordination of transcription and pre-mRNA processing to define exons.
Transcriptional and post-transcriptional regulation of gene expression intersect at RNA polymerase II. The rate of polymerase II movement is altered by loading of transcription factors at the promoter, chromatin structure, pre-mRNA processing, elongation control and termination [1–3]. Thus, polymerase II accumulates at promoters as well as at different locations across a particular gene , but the general patterns across many different genes have yet to be explored. Numerous factors such as histones, post-translation modifying enzymes, and RNA-binding proteins regulate these processes [1, 3]. One key determinant of transcription is the phosphorylation state of the carboxy-terminal domain (CTD) of polymerase II [5, 6] which becomes hyperphosphorylated during transcription elongation [4, 6–9]. Much of our understanding of transcription elongation comes from work in prokaryotes and yeast where most genes are intronless [1, 3]. Transcription and pre-mRNA processing are coordinated, as the two processes affect the efficiency of each other [2, 10]. The spatial patterns of the different phosphorylation states of polymerase II across genes remains poorly understood in mammalian systems.
Results and discussion
To explore the range of locations where polymerase II accumulates across the genome, we performed chromatin immunoprecipitation (ChIP) from HeLa S3 cells, and profiled the purified DNA using an oligonucleotide-tiled microarray interrogating the Encyclopedia of DNA Elements (ENCODE) regions  covering 471 known genes. Two antibodies were used, 8WG16 and 4H8, which recognize the hypophosphorylated (PolIIa) or a phosphorylation-independent state of the CTD of polymerase II (PolII), respectively. Thus, the 4H8 antibody is recognizing the total polymerase II population. Isolated DNA was amplified using a multiple displacement amplification (MDA) strategy (see Materials and methods) .
Summary of RNA polymerase II locations
RefSeq total exons
RefSeq first exons
RefSeq terminal exons
RefSeq internal exons
geneid or sgpGene
Active gene introns
No RefSeq overlap
knownGene total exons
knownGene first exon
knownGene terminal exon
knownGene internal exon
No RefSeq or knownGene
geneid or sgpGene
Levels of RNA polymerase II enrichment at internal exons can vary between genes. To examine whether these patterns are influenced by expression levels, two categories were created: genes with multiple PolII enrichments at internal exons; and genes with PolII at one or zero internal exons. When compared to the mRNA levels, there is no significant difference between the two categories, suggesting that the number of PolII sites across the gene does not vary significantly with RNA levels. Genes with observable PolII enrichment at internal exons are correlated with higher mRNA levels on the expression array. This is consistent with reports proposing the use of PolII ChIP to monitor gene expression . Therefore, the number of PolII sites at internal exons may reflect different levels of transcription elongation control and not just the sensitivity of the experiment.
Most of the hypophosphorylated PolIIa locations at internal exons also overlap a transcription initiation site, as the internal exon in question is often the second exon in the gene. Only two enrichment sites overlap with an internal exon without also being near the first exon of a transcript. One of these is at a CpG island in the MCF2L gene and the other may be an alternative transcription initiation site as annotated in the HG17 assembly at the beginning of the ITGB4BP gene. To classify the remaining sites within introns or in intergenic regions, enrichment sites were compared to other gene databases. As summarized in Table 1, four PolIIa sites are in introns, but three of these are within resolution of annotated or predicted exons, leaving only one location not overlapping an exon of some kind. There are 28 hypophosphorylated polymerase sites not in a RefSeq gene region. After following a similar filtering approach, only 14 sites remain that are not near a putative exon. Thus, only 14% of PolIIa-enriched locations do not overlap with a known exon or actively transcribed region. Additional data file 2 lists PolIIa sites at predicted exons that are probably newly identified transcription initiation locations in HeLa cells. Figure 5 shows two examples of PolII and RNA signal at new sites of transcription. From the pattern of enrichments it is probable that many of these predicted exons are real and are transcription initiation locations, given the observed strong bias of the 8WG16 antibody for transcription initiation locations in well annotated genes.
To determine the generality of these observations, all RNA polymerase II occupancy sites were compared with the known genes and RefSeq databases, version HG16. PolIIa is highly enriched for the first exons around transcription initiation sites (Figure 4) representing 77 of 551 known genes in HG16 on the array (see Additional data file 1 for the entire lists).
Elongation control is a common transcriptional regulation mechanism believed to affect a wide range of functional gene classes . In particular, RNA polymerase II pausing has been proposed to be associated with alternative splicing, . To determine if there is a bias for alternative exons, we counted all the annotated alternatively spliced exons in the knownGene database and determined the distribution of PolII enrichment locations on them. PolII is enriched at 57% of the annotated alternatively spliced exons of the active genes compared to 37% of annotated actively transcribed constitutively expressed exons. We also examined the distribution of all PolII p-values on different types of exons. Each exon was mapped to the smallest p-value ChIP-enriched site that overlaps the exon. The cassette exons are found to be more significantly associated with smaller p-values compared to constitutively expressed exons according to the two-sample Kolmogorov-Smirnov test with a two sided p-value of less than 0.0035.
One attractive hypothesis is that sites of exon enrichment may reflect weaker splice sites where PolII stalls during splice site recognition. Using two different empirical methods to estimate splice site strength, no significant differences are observed between the exons overlapping PolII and those that do not [17, 18]. Alternatively, some of the annotated constitutively expressed exons may actually be subject to alternative splicing decisions. Kampa et al. suggest that the levels of alternative splicing are much higher than commonly believed and annotated in the human genome from their examination of expression on tiled arrays . Consistent with these findings, RNA polymerase II sites may be predicting which exons are being co-transcriptionally alternatively spliced.
To determine if there is any pattern for the 120 PolII enrichment sites that are in RefSeq introns, we compared these sites to knownGene, genscan, geneid, and sgpGene databases and find 31 within resolution of putative exons. Of the remaining 89, 57 are in genes with PolII enrichment sites that also overlap exons, suggesting that they are actively transcribed genes. No clear intronic positional bias is observed.
In conclusion, we have identified new sites of RNA polymerase II accumulation across hundreds of genes in mammalian cells. The large majority of polymerase II-enriched locations are at actively transcribed exons with a bias towards annotated alternatively spliced exons. Many of the PolII sites at annotated constitutively expressed exons may be sites of alternative splicing. Whatever the eventual splicing decision, these observations suggest that events around exons slow transcription elongation. A recent study suggests that even general splicing factors may slow elongation . Stalling of RNA polymerase II near exons may function to slow RNA synthesis in order to wait for the competition of myriad splicing signals to be resolved in order to define the exon [21, 22]. These ChIP data identify where these states of RNA polymerase II are localizing across the ENCODE regions.
Across genes, these data are consistent with the hypothesis of transcriptional pausing at particular locations. Alternatively, it is possible that RNA polymerase II is rearranging during transcription such that the epitope is only accessible around exons. Thus, the conformation of polymerase II may be changing and not the transcription rate. Nonetheless, it is interesting that the majority of observable elongating polymerase II accumulates around exons, suggesting that a major feature of transcription elongation control is coupling to pre-mRNA processing.
These observations differ from those observed in intronless genes typically found in prokaryotes and yeast where a more uniform PolII enrichment is observed across genes . What appears to be conserved is PolII accumulation in coding regions compared to intronic regions. These data highlight the complexity and gene-specific nature of transcription regulation not only at transcription initiation and termination locations but at specific exons. Together, these observations suggest that a major feature of transcription elongation control in mammalian cells is exon definition. Thus, these data provide new insights into the coordination of transcription and pre-mRNA processing in mammalian cells.
Materials and methods
Chromatin immunoprecipitation and DNA amplification
Chromatin immunoprecipitations (ChIP) were performed as described with the following modifications . HeLa S3 cells were first crosslinked with dimethyl adipimidate (DMA) (Pierce) for 10 min, washed with PBS and then crosslinked with formaldehyde for 10 min. Cells were collected, lysed, and chromatin was sheared by sonication to an average length of 1 kb as determined after RNase treatment of the samples on an agarose gel. Chromatin was prepared from four independently grown batches of cells and pooled to generate three replicate immunoprecipitations (IP) and six input samples. Briefly, 8WG16 (Covance) and 4H8 (AbCam) antibodies were incubated with a 50:50 mix of Dynal protein A/G beads for more than 16 h at 4°C in PBS with 5 mg/ml BSA. After washing in PBS, beads with bound antibody were incubated with chromatin from approximately 2 × 107 cells for more than 16 h at 4°C. Beads were washed eight times with RIPA buffer (50 mM HEPES pH 7.6, 1 mM EDTA, 0.7% DOC, 1% IGEPAL, 0.5 M LiCl) before DNA was eluted at 65°C in TE/1% SDS. Crosslinks were reversed by incubating at 65°C for more than 12 h followed by proteinase K treatment, phenol extraction and RNase treatment. Isolated DNA was then amplified isothermally using random nonamer primers and Klenow polymerase (Invitrogen) for more than 4 h, yielding approximately 2 μg of DNA per IP. DNA was prepared and hybridized on Affymetrix ENCODE oligonucleotide tiled arrays using the fragmentation, hybridization, staining and scanning procedure described by Kennedy et al. . Affymetrix ENCODE microarrays have interrogating 25mer oligonucleotide probes tiled every 20 bp on average. A sample of chromatin was set aside before IP and used to represent the input DNA.
Tiled array analysis
Quantile normalization was used to make the distribution of probe intensities the same for all arrays . In the case of the Affymetrix GTRANS software quantile normalization is used within treatment and control replicate sets. Non-parametric methods based on ranks were used to identify ChIP-enriched regions. These methods make mild assumptions about the data distributions and are insensitive to outlying observations. A p-value was calculated for every assay probe on the array. The set of probes used in the calculation of this p-value was defined by a bandwidth parameter b. All probes centered on the chromosome at positions less than b bases 5' or 3' of the given probe position are included in this set.
The Wilcoxon rank sum test , also known as the Mann-Whitney U test, is the basis of the p-value statistic computed by the Affymetrix GTRANS software. The control and treatment observation sets are, respectively, the sets of normalized control and normalized treatment intensities from all replicates and all probes within the bandwidth. The null hypothesis is that the treatment set mean is no larger than that of the control set.
To take into account probe-to-probe variability we used a generalization of the Wilcoxon signed-rank test for blocked data. All input and IP normalized, sign(PM-MM)max(1,|PM-MM|) intensities (where PM are perfect match and MM are mismatched probes) interrogating the same chromosomal location were assigned to the same block. Aligned observations were derived by subtracting the median normalized intensity for a given block from each observation in that block. All aligned observations within the bandwidth were ranked. A statistic W was defined as the sum of the ranks of the aligned IP observations. A p-value was derived from W, based on the joint null distribution of the aligned input and IP ranks. The analyses depend on the assumption that probes are independent. Probes were mapped to the genomic coordinates to ensure that no probe mapped to more than one location in any 1,000-bp window and that no two probes map to the same genomic location.
RNA samples were isolated from HeLa S3 cells and purified with trizol (Invitrogen) and RNeasy (Qiagen). RNA was amplified and hybridized to Affymetrix U133 Plus 2 arrays using standard methods. Three biological replicates were quantile normalized. Gene expression was indicated by the median of PM-MM values over all probes. The hypothesis of difference in gene expression between groups of genes, based on median PM-MM, was tested using the Wilcoxon rank sum statistic. For hybridization to the ENCODE tiled array, RNA was similarly isolated and double-stranded cDNA was generated using Invitrogen Superscript cDNA synthesis kit. cDNA (1-1.5 μg) was hybridized to the tiled array. Three biological replicates were performed for each RNA array.
Sites were determined to be near a genomic annotation if they were within the apparent 1,000 bp resolution. Sites shorter than 1,000 bp were scaled in size to include 1,000 bp around the center of the site. Sites that were longer than 1,000 bp used the data-determined length for their resolution size. Databases were downloaded from the University of California at Santa Cruz (UCSC) Golden Path Genome Browser and loaded into a local MySQL database. Exons were compared and classified as one or more of the following: start, terminal, alternatively spliced, constitutive or cassette. Because the arrays were designed using the HG15 assembly, the data were compared to this version of the human genome unless otherwise noted. The active gene list was defined as those with PolIIa at the first exon of the gene.
PCR primer pairs were designed to amplify 100-bp fragments from selected genomic regions (see Additional data file 8). Each real-time PCR reaction contained 50 nM primers, approximately 1 ng DNA and 1 × ABI SYBR PCR reaction mix. A fluorescence value proportional to the initial quantity of target DNA was calculated by a log-linear regression analysis for each quadruplicate amplification curve . We normalized this value to an input chromatin sample, then normalized this ratio to a reference gene, PAPT, which is not expressed in HeLa cells, to calculate a relative enrichment value for the target ((TargetIP)/(TargetInp))/((PAPTIP)/(PAPTInput)).
All data is present at Gene Expression Omnibus (GEO) at accession number GSE2735.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 is a table listing PolIIa annotated to refGene. Additional data file 2 is a table listing PolIIa annotated to known genes. Additional data file 3 is a table listing PolIIa annotated to RefSeq. Additional data file 4 is a table listing PolII annotated to known genes. Additional data file 5 is a table listing PolII annotated to genscan exons. Additional data file 6 is a table listing knownGene and RefSeq populations on the ENCODE array. Additional data file 7 is a table listing the PolIIa-defined active gene list. Additional data file 8 is the PCR primer list and annotation.
We thank Pamela Hollasch, Maura Berkeley and the DFCI Affymetrix core for all their assistance, and Jason Carroll and Jessica Hurt for critical reading of the manuscript. We thank Adnan Derti for trying some splice-site strength analysis. This work was funded by a NHGRI K22 career award, HG02488-01A1 (A.S.B.), and a DOD grant DAMD17-02-0364 (P.A.S.).
- Arndt KM, Kane CM: Running with RNA polymerase: eukaryotic transcript elongation. Trends Genet. 2003, 19: 543-550. 10.1016/j.tig.2003.08.008.PubMedView ArticleGoogle Scholar
- Kornblihtt AR, de la Mata M, Fededa JP, Munoz MJ, Nogues G: Multiple links between transcription and splicing. RNA. 2004, 10: 1489-1498. 10.1261/rna.7100104.PubMedPubMed CentralView ArticleGoogle Scholar
- Sims RJ, Belotserkovskaya R, Reinberg D: Elongation by RNA polymerase II: the short and long of it. Genes Dev. 2004, 18: 2437-2468. 10.1101/gad.1235904.PubMedView ArticleGoogle Scholar
- Cheng C, Sharp PA: RNA polymerase II accumulation in the promoter-proximal region of the dihydrofolate reductase and gamma-actin genes. Mol Cell Biol. 2003, 23: 1961-1967. 10.1128/MCB.23.6.1961-1967.2003.PubMedPubMed CentralView ArticleGoogle Scholar
- Dahmus ME: Reversible phosphorylation of the C-terminal domain of RNA polymerase II. J Biol Chem. 1996, 271: 19009-19012.PubMedView ArticleGoogle Scholar
- Komarnitsky P, Cho EJ, Buratowski S: Different phosphorylated forms of RNA polymerase II and associated mRNA processing factors during transcription. Genes Dev. 2000, 14: 2452-2460. 10.1101/gad.824700.PubMedPubMed CentralView ArticleGoogle Scholar
- Boehm AK, Saunders A, Werner J, Lis JT: Transcription factor and polymerase recruitment, modification, and movement on dhsp70 in vivo in the minutes following heat shock. Mol Cell Biol. 2003, 23: 7628-7637. 10.1128/MCB.23.21.7628-7637.2003.PubMedPubMed CentralView ArticleGoogle Scholar
- Kim M, Ahn SH, Krogan NJ, Greenblatt JF, Buratowski S: Transitions in RNA polymerase II elongation complexes at the 3' ends of genes. EMBO J. 2004, 23: 354-364. 10.1038/sj.emboj.7600053.PubMedPubMed CentralView ArticleGoogle Scholar
- Ahn SH, Kim M, Buratowski S: Phosphorylation of serine 2 within the RNA polymerase II C-terminal domain couples transcription and 3' end processing. Mol Cell. 2004, 13: 67-76. 10.1016/S1097-2765(03)00492-1.PubMedView ArticleGoogle Scholar
- Hirose Y, Tacke R, Manley JL: Phosphorylated RNA polymerase II stimulates pre-mRNA splicing. Genes Dev. 1999, 13: 1234-1239.PubMedPubMed CentralView ArticleGoogle Scholar
- The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004, 306: 636-640. 10.1126/science.1105136.Google Scholar
- Dean FB, Hosono S, Fang L, Wu X, Faruqi AF, Bray-Ward P, Sun Z, Zong Q, Du Y, Du J, et al: Comprehensive human genome amplification using multiple displacement amplification. Proc Natl Acad Sci USA. 2002, 99: 5261-5266. 10.1073/pnas.082089499.PubMedPubMed CentralView ArticleGoogle Scholar
- Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ, et al: Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004, 116: 499-509. 10.1016/S0092-8674(04)00127-8.PubMedView ArticleGoogle Scholar
- Sandoval J, Rodriguez JL, Tur G, Serviddio G, Pereda J, Boukaba A, Sastre J, Torres L, Franco L, Lopez-Rodas G: RNAPol-ChIP: a novel application of chromatin immunoprecipitation to the analysis of real-time gene transcription. Nucleic Acids Res. 2004, 32: e88-10.1093/nar/gnh091.PubMedPubMed CentralView ArticleGoogle Scholar
- Enriquez-Harris P, Levitt N, Briggs D, Proudfoot NJ: A pause site for RNA polymerase II is associated with termination of transcription. EMBO J. 1991, 10: 1833-1842.PubMedPubMed CentralGoogle Scholar
- Kim M, Krogan NJ, Vasiljeva L, Rando OJ, Nedea E, Greenblatt JF, Buratowski S: The yeast Rat1 exonuclease promotes transcription termination by RNA polymerase II. Nature. 2004, 432: 517-522. 10.1038/nature03041.PubMedView ArticleGoogle Scholar
- Shapiro MB, Senapathy P: RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic Acids Res. 1987, 15: 7155-7174.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang MQ, Marr TG: A weight array method for splicing signal analysis. Comput Appl Biosci. 1993, 9: 499-509.PubMedGoogle Scholar
- Kampa D, Cheng J, Kapranov P, Yamanaka M, Brubaker S, Cawley S, Drenkow J, Piccolboni A, Bekiranov S, Helt G, et al: Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 2004, 14: 331-342. 10.1101/gr.2094104.PubMedPubMed CentralView ArticleGoogle Scholar
- Ujvari A, Luse DS: Newly Initiated RNA encounters a factor involved in splicing immediately upon emerging from within RNA polymerase II. J Biol Chem. 2004, 279: 49773-49779. 10.1074/jbc.M409087200.PubMedView ArticleGoogle Scholar
- Roberts GC, Gooding C, Mak HY, Proudfoot NJ, Smith CW: Co-transcriptional commitment to alternative splice site selection. Nucleic Acids Res. 1998, 26: 5568-5572. 10.1093/nar/26.24.5568.PubMedPubMed CentralView ArticleGoogle Scholar
- Robson-Dixon ND, Garcia-Blanco MA: MAZ elements alter transcription elongation and silencing of the fibroblast growth factor receptor 2 exon IIIb. J Biol Chem. 2004, 279: 29075-29084. 10.1074/jbc.M312747200.PubMedView ArticleGoogle Scholar
- Ren B, Cam H, Takahashi Y, Volkert T, Terragni J, Young RA, Dynlacht BD: E2F integrates cell cycle progression with DNA repair, replication, and G(2)/M checkpoints. Genes Dev. 2002, 16: 245-256. 10.1101/gad.949802.PubMedPubMed CentralView ArticleGoogle Scholar
- Kennedy GC, Matsuzaki H, Dong S, Liu WM, Huang J, Liu G, Su X, Cao M, Chen W, Zhang J, et al: Large-scale genotyping of complex DNA. Nat Biotechnol. 2003, 21: 1233-1237. 10.1038/nbt869.PubMedView ArticleGoogle Scholar
- Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19: 185-193. 10.1093/bioinformatics/19.2.185.PubMedView ArticleGoogle Scholar
- Hollander M, Wolfe DA: Nonparametric Statistical Methods. 1999, New York: John Wiley, 2ndGoogle Scholar
- Ostermeier GC, Liu Z, Martins RP, Bharadwaj RR, Ellis J, Draghici S, Krawetz SA: Nuclear matrix association of the human beta-globin locus utilizing a novel approach to quantitative real-time PCR. Nucleic Acids Res. 2003, 31: 3257-3266. 10.1093/nar/gkg424.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.