Development of an Illumina-based ChIP-exonuclease method provides insight into FoxA1-DNA binding properties
© Serandour et al.; licensee BioMed Central Ltd. 2013
Received: 23 August 2013
Accepted: 27 December 2013
Published: 27 December 2013
ChIP-exonuclease (ChIP-exo) is a modified ChIP-seq approach for high resolution mapping of transcription factor DNA sites. We describe an Illumina-based ChIP-exo method which provides a global improvement of the data quality of estrogen receptor (ER) ChIP and insights into the motif structure for key ER-associated factors. ChIP-exo of the ER pioneer factor FoxA1 identifies protected DNA with a predictable 8 bp overhang from the Forkhead motif, which we term mesas. We show that mesas occur in multiple cellular contexts and exist as single or overlapping motifs. Our Illumina-based ChIP-exo provides high resolution mapping of transcription factor binding sites.
Since the last decade, ChIP-on-chip and ChIP-seq technologies have considerably increased our understanding of the functional organization of the genome . These technologies allow the genome-wide mapping of chromatin-associated proteins and histone marks. ChIP-seq is now commonly used to study a wide range of biological processes including transcription, replication, DNA repair and evolution of the genome [2, 3]. ChIP-seq of transcription factors is particularly useful to determine their bound DNA motifs and target genes. Nevertheless the resolution of ChIP-seq is inadequate to resolve positional information between different motifs within binding sites; additionally, overlaps between different ChIP-seq datasets can be exaggerated due to the width of peaks. The precise determination of the bound DNA motifs and their positions relative to other motifs is of importance for understanding the features involved in transcription factor-DNA interactions, an important level of information when considering, for example, GWAS-identified single nucleotide polymorphisms (SNPs) within a transcription factor binding site .
Recent studies from Pugh and colleagues report a SOLiD platform-based method called ChIP-exonuclease (ChIP-exo), which greatly increases the resolution of ChIP peaks [5, 6]. To date, these experiments have been mostly limited to yeast models. Due to the fact that the Illumina sequencing platform is currently the most common sequencing technology, we sought to adapt ChIP-exo from the SOLiD to the Illumina platform. We apply the Illumina-based ChIP-exo to human cancer cell line experiments and directly compare the resolution of the peaks to ChIP-seq. We find ChIP-exo to outperform ChIP-seq by all parameters, revealing unexpected insight into the FoxA1-DNA interface in breast cancer cells and in mouse liver.
Results and discussion
An Illumina-based ChIP-exo method
After an Illumina ChIP-seq library preparation, each ChIPed DNA fragment is ligated to the P7 and P5 adapters on both sides. The single-end sequencing of the ChIP-seq library results in two shifted populations of reads, one mapped on the top strand and the other mapped on the bottom strand (Additional file 3: Figure S2). These two shifted populations of reads are taken into consideration to estimate the centre of the peak using the peak caller MACS . After our Illumina ChIP-exo library preparation, each ChIPed DNA fragment results in two library fragments: one with the P5 adapter ligated downstream of the exonuclease digestion-protected DNA and the other with the P5 adapter ligated upstream of it. In each case, the P7 adapter is ligated to the other extremity. The single-end sequencing of the ChIP-exo library results in two overlapping populations of reads, one mapped on the top strand and the other mapped on the bottom strand.
Comparative analysis of ChIP-seq and ChIP-exo
Using ChIP-seq, it has been challenging to resolve structure between functionally related transcription factors that bind to adjacent sequences and operate as a complex. For example, three key transcription factors involved in ER-DNA interactions are ER, FoxA1 and GATA3 [8, 9]. Using the higher resolution of ChIP-exo, we measured the density of ER, FoxA1 and GATA3 motifs around ER summits. Figure 2D and 2E show the density of ER/FoxA1/GATA3 motifs around the ER seq-summits and ER exo-summits. The ChIP-exo ER motif density distribution is narrower than that of ChIP-seq with characteristic widths of 88 bp and 114 bp, respectively (see Methods), indicating ChIP-exo peak summits are called more consistently near the locations of transcription factor binding sites. Additionally, in both ChIP-seq and ChIP-exo, the ER, FoxA1 and GATA3 motifs are enriched near the ER summits, but the increased resolution of ChIP-exo peak summits affords clearer appreciation of how the transcription factors associate: namely GATA3 motifs are adjacent to the central ERE and further away from the GATA motifs are the Forkhead motifs (representing FoxA1 binding domains). This pattern appears to show a predictable structure that these three key breast cancer factors form in defining a transcriptionally active cis-regulatory element, a finding revealed by the increased resolution derived from ChIP-exo.
Insights into transcription factor binding
We provide a protocol for ChIP-exo based on the commonly used Illumina sequencing platform. As ChIP-seq provided a substantial improvement over ChIP-chip in the accuracy of peak calling and ability to distinguish nearby binding sites [3, 15], our data strongly suggest that ChIP-exo outperforms ChIP-seq in the ability to discriminate nearby peaks and small peaks. In addition, it can reveal insights into the patterns of transcription factor binding to the DNA, including the prediction of transcription factor dimer binding. We also show for the first time that ChIP-exo is feasible in primary tissue such as mouse liver. We believe that the ChIP-exo technology can help characterise the architecture of the cis-regulatory elements, particularly with regards to highlighting the cooperativity between transcription factors.
All data are deposited in ArrayExpress with accession number E-MTAB-1827. Figure 3A includes mouse FoxA1 ChIP-seq data deposited under the ArrayExpress accession numbers E-MTAB-223  and E-MTAB-1414 .
Materials and methods
MCF7, MDA-MB231, MDA-MB453, LNCaP and ZR75-1 human cell lines were obtained from ATCC and grown in DMEM or RPMI (LNCaP and ZR75-1) supplemented with 10 % FBS. The liver material was isolated from three adult (4 months) C57/BL6 males obtained from Cancer Research UK Cambridge Institute. The investigation was approved by the ethics committee and followed the Cambridge Institute guidelines for the use of animals in experimental studies under home office license PPL80/2197.
The antibodies used for the ChIP-exo were anti-FoxA1 (ab5089) from Abcam, anti-ER (sc-543) and anti-FOXA2 (sc-6554) from Santa Cruz Biotechnologies.
Chromatin immunoprecipitation sequencing
The ChIPs were performed as described previously , using 10 ug of anti-ER antibody (Santa Cruz, ref. sc-543). The ChIP-seq and the input libraries were prepared using the TruSeq ChIP Sample Prep Kit (Illumina, ref. IP-202-1012).
Chromatin immunoprecipitation-exonuclease on Illumina sequencing platform
The main differences between our protocol and the Pugh protocol  are the oligonucleotides sequences, different washing buffer, the use of magnetic beads and the PCR mix. For the lambda exonuclease digestion, we have tested the Pugh conditions (10 units for 30 min) and a higher concentration (50 units for 1 h) on an ER ChIP-exo conducted in MCF-7 cells (Additional file 9). We found no significant difference in peak width with increased exonuclease concentration.
End polishing: 1 mM ATP, 100 uM dNTP, 15 U T4 DNA polymerase, 5 U Klenow DNA polymerase, 50 U T4 PolyNucleotide Kinase, in 100 uL 1× NEBuffer 2 (50 mM NaCl, 10 mM Tris–HCl, 10 mM MgCl2, 1 mM DTT, pH 7.9) at 30°C for 30 min.
Ligation of the P7 exo-adapter: 1 mM ATP, 150 pmol P7 exo-adapter, 2000 U T4 DNA ligase, in 100 uL 1× NEBuffer 2 at 25°C for 60 min.
Nick repair: 150 uM dNTP, 15 U phi29 DNA polymerase in 100 uL 1× phi29 reaction buffer (50 mM Tris–HCl pH 7.5, 10 mM MgCl2, 10 mM (NH4)2SO4, 1 mM DTT, pH 7.5) at 30°C for 20 min.
Lambda exonuclease digestion: 10 U Lambda exonuclease in 100 uL 1× NEB Lambda exonuclease buffer (67 mM Glycine-KOH, 2.5 mM MgCl2, 50 μg/mL BSA, pH 9.4) at 37°C for 30 min.
RecJf exonuclease digestion: 30 U RecJf exonuclease in 100 uL NEBuffer 2 at 37°C for 30 min.
Elution and reverse cross-linking: the beads are incubated with 100 ug of Proteinase K in 200 uL of elution buffer (50 mM Tris HCl pH 8; 10 mM EDTA; 1% SDS) overnight at 65°C. The 200 uL of supernatant is transferred to a new tube and diluted in 200 uL TE (10 mM Tris, 1 mM EDTA, pH 7.4). The DNA is purified using phenol-chloroform-isoamyl alcohol extraction followed by ethanol precipitation. The resulting DNA pellet is dissolved in 20 uL water. The DNA can be stored at this step at −20°C.
P7 primer extension: the 20 uL of DNA is denaturated 5 min at 95°C, then mixed with 5 pmol of the P7 primer and incubated in 50 uL 1× NEB Phi29 reaction buffer for 5 min at 65°C and 2 min at 30°C in a thermocycler. After the addition of 10 U Phi29 DNA polymerase and 200 uM dNTP, the mix is incubated 20 min at 30°C and then 10 min at 65°C. The DNA is purified using AMPure beads (1.8 volume) and eluted in 20 uL of resuspension buffer (Tris-Acetate 10 mM pH 8).
Ligation of the P5 exo-adapter: the 20 uL of DNA is mixed with 15 pmol of the P5 exo-adapter, 2,000 U T4 DNA ligase and incubated in 50 uL 1× NEB T4 DNA ligase buffer for 60 min at 25°C and then 10 min at 65°C. The DNA is purified using AMPure beads (1.8 volume) and eluted in 20 uL of resuspension buffer (Tris-Acetate 10 mM pH 8).
PCR amplification: the DNA sample is amplified using 0.5 uM of the universal reverse PCR primer and the forward PCR primer containing the index sequence of choice in 50 uL 1× NEBNext High-Fidelity PCR Master Mix (New England Biolabs, M0541). The number of PCR cycles is 13 to 18, depending on the ChIP efficiency. The PCR product is purified using AMPure beads (1.8 volume) and eluted in 20 uL of resuspension buffer (Tris-Acetate 10 mM pH 8).
Gel-size selection: 200 to 300 bp PCR product is purified from a 2% agarose gel using MinElute Gel Extraction Kit (Qiagen) and eluted in 20 uL of elution buffer.
Illumina sequencing: the library is quantified using the KAPA library quantification kit for Illumina sequencing platforms (KAPA Biosystems, KK4824) and sequenced on a MiSeq, GAII or HiSeq following the manufacturer’s protocol.
The oligonucleotides were synthesised by Sigma-Aldrich and purified by HPLC (sequences in Table 3). The P7 exo-adapter and the P5 exo-adapter were obtained in mixing the couple of complement oligonucleotides in an Annealing Buffer (10 mM Tris pH 8, 50 mM NaCl, 1 mM EDTA) and annealed by heating 5 min at 95°C then let cool slowly to room temperature. The oligonucleotides designed for ChIP-exo are adapted from the oligonucleotide sequences © 2007–2012 Illumina, Inc. All rights reserved. Derivative works created by Illumina customers are authorised for use with Illumina instruments and products only. All other uses are strictly prohibited.
After sequencing, reads were aligned to human genome version GRCh37 (hg19) using BWA version 0.7.5a, and BAM-formatted files were created using samtools version 0.1.18. Reads with mapping quality less than 5 were discarded; reads overlapping ENCODE’s ‘signal artefact’ regions were also discarded . These regions show significant signal for all or most transcription factors and histone marks, across many cell lines, so are presumed to be artefactual.
ER and related factors
Example peaks in figures, showing the top- and bottom-strand reads in red and blue, respectively, were made by splitting the reads into top- and bottom-strand, then generating bedgraph files for both using custom software, converting them to bigwig using UCSC’s ‘bedGraphToBigWig’ software , making overlay track hubs for the two strands, and viewing them with the UCSC genome browser [19, 20].
Triplicates of ER ChIP-exo and -seq were converted to consensus peaks by identifying locations in which all three replicates had summits within 100 bp, and choosing the strongest peak’s summit as the true summit. Overlaps between the consensus summit sets were calculated by considering peaks to overlap if their summits were within 100 bp, using the BEDTools package . Read density around summits was calculated using custom software; plots were made using the ‘ggplot2’ R package [22, 23].
Motif frequency (Additional file 6: Figure S5) was calculated by scanning for motifs within 100 bp of peak summits using FIMO , then counting the number of regions with at least one motif with p-value <0.0025. Motif strength was calculated the same way, except taking the strongest motif (lowest p-value) in each region as the defining one.
Motif density analysis
To calculate the densities of ER, FoxA1 and GATA3 motifs (Figure 2D and 2E), we analysed the genomic sequence 250 bp upstream and downstream of ER summits. Motif occurrences were found using FIMO and TRANSFAC motifs at a p-value threshold of 10-3. The number of motif occurrences at each position relative to the ER summit was summed and normalised by the total number of motif occurrences. Motif density profiles were smoothed using a weighted moving average in 20 bp windows where weights are shaped as an isosceles triangle and the central point is given the maximum weight. The characteristic width of the ER motif density was computed by finding the width of the region where the density is greater than 1 in 500 base pairs. Matrices used: ER: M01801, FoxA1: M00724, GATA3: M00351.
FoxA1 mesas and related analyses
Figure 3B shows the occurrence of forkhead motifs around positions where the read depth on one strand increases by 100 reads between one nucleotide and the next. Some such positions lack motifs; conversely some positions with smaller increase in read depth have correctly positioned motifs. Three parameters may be varied: increase in read depth, presence of motif with some p-value, and the stringency of the positioning (9 bp is most common, but 8 bp and 10 bp also occur with some frequency). A range of parameter combinations was tried (data not shown); in the end, a true mesa is defined as one with a read depth increase of at least 30 bp and a forkhead motif with the correct orientation relative to the mesa, a p-value < = 0.0025, and a motif position of 8 to 10 bp from the read depth increase. These values were chosen because they reflected reasonably clear inflection points in the plots of mesa occurrences as parameters changed. Peaks were classified as paired mesas if they had mesas on the top and bottom strands, with paired motifs in palindromic orientation and their trailing ends within 5 bp of each other.
Hepatocyte nuclear factor 3-alpha
Hepatocyte nuclear factor 3-beta
Genome-wide association study
Model-based analysis of ChIP-Seq data
Single nucleotide polymorphism.
We thank K. Stefflova and S. Watt for having provided us the cross-linked mouse livers. This work is supported by the ERC Starting Grant ER_PARTNERS.
- Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, Epstein CB, Frietze S, Harrow J, Kaul R, Khatun J, Lajoie BR, Landt SG, Lee BK, Pauli F, Rosenbloom KR, Sabo P, Safi A, Shoresh N, Simon JM, Song L, Trinklein ND, Altshuler RC, Birney E, Brown JB, Cheng C, Djebali S, Dong X, Dunham I, et al: An integrated encyclopedia of DNA elements in the human genome. Nature. 2012, 489: 57-74. 10.1038/nature11247.View ArticleGoogle Scholar
- Furey TS: ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat Rev Genet. 2012, 13: 840-852. 10.1038/nrg3306.PubMedPubMed CentralView ArticleGoogle Scholar
- Park PJ: ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet. 2009, 10: 669-680. 10.1038/nrg2641.PubMedPubMed CentralView ArticleGoogle Scholar
- Cowper-Sal Lari R, Zhang X, Wright JB, Bailey SD, Cole MD, Eeckhoute J, Moore JH, Lupien M: Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nat Genet. 2012, 44: 1191-1198. 10.1038/ng.2416.PubMedPubMed CentralView ArticleGoogle Scholar
- Rhee HS, Pugh BF: Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell. 2011, 147: 1408-1419. 10.1016/j.cell.2011.11.013.PubMedPubMed CentralView ArticleGoogle Scholar
- Rhee HS, Pugh BF: Genome-wide structure and organization of eukaryotic pre-initiation complexes. Nature. 2012, 483: 295-301. 10.1038/nature10799.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS: Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008, 9: R137-10.1186/gb-2008-9-9-r137.PubMedPubMed CentralView ArticleGoogle Scholar
- Hurtado A, Holmes KA, Ross-Innes CS, Schmidt D, Carroll JS: FOXA1 is a key determinant of estrogen receptor function and endocrine response. Nat Genet. 2011, 43: 27-33. 10.1038/ng.730.PubMedPubMed CentralView ArticleGoogle Scholar
- Theodorou V, Stark R, Menon S, Carroll JS: GATA3 acts upstream of FOXA1 in mediating ESR1 binding by shaping enhancer accessibility. Genome Res. 2013, 23: 12-22. 10.1101/gr.139469.112.PubMedPubMed CentralView ArticleGoogle Scholar
- Carroll JS, Liu XS, Brodsky AS, Li W, Meyer CA, Szary AJ, Eeckhoute J, Shao W, Hestermann EV, Geistlinger TR, Fox EA, Silver PA, Brown M: Chromosome-wide mapping of estrogen receptor binding reveals long-range regulation requiring the forkhead protein FoxA1. Cell. 2005, 122: 33-43. 10.1016/j.cell.2005.05.008.PubMedView ArticleGoogle Scholar
- Lupien M, Eeckhoute J, Meyer CA, Wang Q, Zhang Y, Li W, Carroll JS, Liu XS, Brown M: FoxA1 translates epigenetic signatures into enhancer-driven lineage-specific transcription. Cell. 2008, 132: 958-970. 10.1016/j.cell.2008.01.018.PubMedPubMed CentralView ArticleGoogle Scholar
- Laganiere J, Deblois G, Lefebvre C, Bataille AR, Robert F, Giguere V: From the cover: location analysis of estrogen receptor alpha target promoters reveals that FOXA1 defines a domain of the estrogen response. Proc Natl Acad Sci USA. 2005, 102: 11651-11656. 10.1073/pnas.0505575102.PubMedPubMed CentralView ArticleGoogle Scholar
- Jankowski A, Szczurek E, Jauch R, Tiuryn J, Prabhakar S: Comprehensive prediction in 78 human cell lines reveals rigidity and compactness of transcription factor dimers. Genome Res. 2013, 23: 1307-1318. 10.1101/gr.154922.113.PubMedPubMed CentralView ArticleGoogle Scholar
- Littler DR, Alvarez-Fernandez M, Stein A, Hibbert RG, Heidebrecht T, Aloy P, Medema RH, Perrakis A: Structure of the FoxM1 DNA-recognition domain bound to a promoter sequence. Nucleic Acids Res. 2010, 38: 4527-4538. 10.1093/nar/gkq194.PubMedPubMed CentralView ArticleGoogle Scholar
- Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, Thiessen N, Griffith OL, He A, Marra M, Snyder M, Jones S: Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 2007, 4: 651-657. 10.1038/nmeth1068.PubMedView ArticleGoogle Scholar
- Stefflova K, Thybert D, Wilson MD, Streeter I, Aleksic J, Karagianni P, Brazma A, Adams DJ, Talianidis I, Marioni JC, Flicek P, Odom DT: Cooperativity and rapid evolution of cobound transcription factors in closely related mammals. Cell. 2013, 154: 530-540. 10.1016/j.cell.2013.07.007.PubMedPubMed CentralView ArticleGoogle Scholar
- Schmidt D, Wilson MD, Spyrou C, Brown GD, Hadfield J, Odom DT: ChIP-seq: using high-throughput sequencing to discover protein-DNA interactions. Methods. 2009, 48: 240-248. 10.1016/j.ymeth.2009.03.001.PubMedPubMed CentralView ArticleGoogle Scholar
- Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D: BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010, 26: 2204-2207. 10.1093/bioinformatics/btq351.PubMedPubMed CentralView ArticleGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Rres. 2002, 12: 996-1006.View ArticleGoogle Scholar
- Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26: 841-842. 10.1093/bioinformatics/btq033.PubMedPubMed CentralView ArticleGoogle Scholar
- R: A language and environment for statistical computing. http://www.r-project.org,
- Wickham H: ggplot2: Elegant Graphics for Data Analysis. 2009, New York: Springer Publishing Company, IncorporatedView ArticleGoogle Scholar
- Grant CE, Bailey TL, Noble WS: FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011, 27: 1017-1018. 10.1093/bioinformatics/btr064.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.