Materials and resource table
See the Additional file 5: Table S4.
Cell culture, cell cycle synchronization, and incorporation of thymidine analogs for NAIL-seq
GM12878 and K562 (3111C0001CCC000039, National Infrastructure of Cell line resource, China) cells were cultured in RPMI1640 media supplied with 15% FBS as described previously [1, 67]. For G1 arrest, GM12878 or K562 cells were incubated with 1 μM or 5 μM palbociclib (SelleckChem) for 36 h, respectively. For EdU and BrdU dual-labeling experiments, G1-arrested GM12878 and K562 cells were washed with pre-warmed RPMI1640 and released into fresh medium for 3 or 2.5 h, respectively. Next, the cells were cultured with 10 μM EdU for 15 min, followed by a wash with pre-warmed RPMI1640, and then cultured with 50 μM BrdU for an additional 15 min. For EdU/HU treatment, G1-synchronized GM12878 or K562 cells were released into fresh medium with 10 μM EdU plus 5 mM HU for 24 h or 10 μM EdU plus 10 mM HU 12 h, respectively. The concentration of EdU was comparable with that of endogenous dTTP [68]. EdU is efficiently incorporated into nascent DNA, and a 2-min period is sufficient for EdU to label nascent Okazaki fragments in OK-seq [10]. Moreover, G1-released K562 cells require an additional 2.5 h to reach the G1/S transition. Therefore, the vast majority of initiation sites are subjected to EdU labeling in the EdU/HU assay. For ⍺-amanitin or DRB mediated-RNA polymerase II inhibition, G1-arrested K562 cells were released into fresh media containing the indicated concentrations of ⍺-amanitin or DRB with HU and EdU for 12 h.
Murine embryonic stem V6.5 (mES-V6.5) cells were grown in 2i media containing 15% FBS, 1 μM PD0325901 (SelleckChem), 3 μM CHIR99021 (SelleckChem), 1000 U/mL mouse LIF (Millipore), and other essential supplements as described previously [69]. mES-V6.5 cells were cultured on 0.2% gelatin (Sigma)-treated plates and synchronized by thymidine and nocodazole as described previously [46]. Mitotically arrested cells were shaken off, washed with pre-warmed 2i media, and cultured in fresh media for 1.5 h, followed by sequential labeling with 10 μM EdU and 50 μM BrdU for 10 min each. To reduce replication fork progression speed, M phase-synchronized mES cells were released into fresh media and incubated with 4 mM HU and 50 μM EdU for 3 h. To induce degradation of RNA polymerase II, homozygous mAID-tagged RNA polymerase II cells were first synchronized as wild-type mES cells. Next, 1 μg/mL doxycycline (Dox) was added to induce production of OsTIR1. In total, 500 μM IAA was added 1 h before the cells were released from the nocodazole treatment to induce degradation of RNA polymerase II. The nocodazole-arrested cells were then cultured in the presence of Dox, IAA, EdU, and HU for 3 h before harvesting.
Wild-type C57BL/6NCr mice (6–8 weeks old) were subjected to splenic B cell purification by following the manufacturer’s instructions included with the EasySep Mouse B cell isolation kit (STEMCELL). Both male and female mice were indiscriminately used in this study. Mice were housed and handled according to the standards set by the Peking University laboratory animal center, and all animal experiments were approved by the institutional animal care and use committee at Peking University. Naïve splenic B cells were isolated and activated with LPS (25 mg/mL; Sigma) and IL-4 (5 ng/mL; Sigma). B cells were treated with 10 mM HU and 10 μM EdU for 28 h beginning from the activation [15]. For ⍺-amanitin treatment, primary B cells were incubated with 10 mM HU, 10 μM EdU, and ⍺-amanitin for 28 h after LPS and IL-4 were added.
NAIL-seq
Cells were harvested, washed with PBS, and subjected to genomic DNA (gDNA) extraction with Proteinase K digestion as described previously [70]. Purified genomic DNA was subjected to the Click reaction with 100 μM Biotin-PEG-azide, 2 mM CuSO4, 4 mM THPTA, and 10 mM sodium ascorbate at 25 °C for 1 h, followed by DNA precipitation. The copper in the Click reaction hydrolyzes gDNA and fragments it into small fragments (around 200~400 bp in length), so that each fragmented nascent DNA has little chance to contain both EdU and BrdU simultaneously. The dissolved DNA was denatured at 95 °C for 5 min and phosphorylated by T4 PNK at 37 °C for 30 min.
For EdU- and BrdU-labeled samples, the denatured and modified DNA was purified by anti-BrdU (BU1/75) and Protein G beads to isolate the BrdU-labeled portion, and the supernatants were denatured again and incubated with Streptavidin C1 beads (Dynabeads) to isolate the EdU-labeled portion. For EdU/HU-labeled samples, T4 PNK-treated DNA was incubated with C1 beads alone. All of the enrichment steps were performed at room temperature for 4 h.
Isolated EdU- or BrdU-labeled ssDNA samples were ligated onto the beads with two types of bridge adapters (Bridge adapter -1 and -2, see Additional file 5: Table S4 for sequence details) simultaneously at room temperature for at least 4 h. The ligation reaction volume contained 1 μM of each Bridge adapter, 1,600 U T4 DNA ligase and 15% PEG-8000. Next, the ligated DNA was carefully washed and tagged with Illumina P5-I5 and P7-I7 sequences as described elsewhere [70]. All libraries were subjected to 2 × 150 bp Hiseq sequencing.
ORC2 and MCM5 ChIP-seq
For the ORC2 ChIP-seq, ORC2 tagged with Avi-tag [71] at its N-terminal was inserted into the pMAX-GFP (Lonza) backbone to replace GFP. Bacterial BirA [71] was inserted into pX330 to replace Cas9. Ten million K562 cells transfected with 30 μg BirA and 30 μg Avi-ORC2 plasmids were immediately arrested at the G1 phase by 5 μM palbociclib for 36 h and then incubated with or without 10 μg/mL ⍺-amanitin for 12 h in the presence of palbociclib. For MCM5, wild-type K562 cells were treated with palbociclib for 36 h and then cultured in the presence or absence of 10 μg/mL ⍺-amanitin for 12 h with palbociclib.
Cells were fixed by 1% formaldehyde (F1635, Sigma) for 10 min at room temperature and quenched by 125 mM glycine for 5 min. Ten million fixed cells were washed twice with PBS and lysed with ice-cold NP40 lysis buffer (10 mM Tris-HCl, pH 7.5; 150 mM NaCl; 0.05% NP40) on ice for 15 min. The cell lysate was separated by 24% (w/v) sucrose in NP40 lysis buffer. The nuclei pellet was washed with 1 mM EDTA/PBS, resuspended in glycerol buffer (20 mM Tris-HCl, pH 8.0; 75 mM NaCl; 0.5 mM EDTA; 0.85 mM DTT; 50% glycerol), and lysed using nuclei lysis buffer (10 mM HEPES, pH 7.6; 1 mM DTT; 7.5 mM MgCl2; 0.2 mM EDTA; 0.3 M NaCl; 1 M urea; 1% NP40). Chromatin pellets were washed twice with 1 mM EDTA/PBS and resuspended in sonication buffer (20 mM Tris-HCl, pH 8.0; 150 mM NaCl; 2 mM EDTA; 0.1% SDS; 1% Triton X-100; 4 mM CaCl2) containing 40 U MNase for 15 min at 37 °C. The nuclease was quenched by adding 5 mM EDTA and 5 mM EGTA. The chromatin in the samples was further fragmented by sonication and separated from the soluble fraction.
For ORC2 ChIP-seq, the supernatants were pre-cleared using 40 μL protein G dynabeads for 1 h and then incubated with 40 μL Streptavidin T1 beads (Dynabeads) for 6 h at 4 °C. The beads were carefully washed with freshly made 2% SDS, high-salt buffer (500 mM NaCl; 0.1% SDS; 1% Triton X-100; 2 mM EDTA; 20 mM Tris-HCl, pH 8.0), LiCl buffer (0.25 M LiCl; 1% NP-40; 10 mM Tris-HCl; pH 8.0, 1 mM EDTA), and TE buffer. The bead-bound chromatin was eluted with elution buffer (20 mM Tris-HCl, pH 8.0; 10 mM EDTA; 1% SDS) at 65 °C overnight. For MCM5 ChIP-seq, the soluble fractions were directly subjected to anti-MCM5 (1:50) enrichment at 4 °C overnight. Next, protein G dynabeads were added and the mixture was incubated for 1 h. The beads were washed using the same procedure used for the ORC2/5 ChIP, except that the wash with 2% SDS was not performed. The chromatin on the beads was eluted twice with elution buffer at 65 °C for 15 min.
The eluate fraction was digested by RNase A and incubated with Proteinase K overnight. DNA was purified, end-repaired, and subjected to poly-dC tailing as described previously [72]. After poly-dC tailing, DNA was tagged with a biotinylated primer, captured by Streptavidin C1 beads, and ligated with a selected bridge adapter (see Additional file 5: Table S4) overnight at room temperature. The ligated products were then washed and tagged with Illumina P5-I5 and P7-I7 sequences via PCR for 13 cycles. Sequencing was performed on the Illumina Hiseq platform (2 × 150 bp).
dCas9-mediated transcription elongation blockade
Four gRNAs (see Additional file 5: Table S4 for sequence details) binding to the non-template strand of the fourth exon of CMIP were designed in a 134-bp region and inserted into a single plasmid expressing nuclease-dead SpCas9 (dCas9) following the procedures for CRISPRi [73]. One million K562 cells transfected with 5 μg plasmids were arrested at the G1 phase by 5 μM palbociclib for 36 h. The G1-arrested cells were either released into the early S phase or grown through the G1 phase with a further incubation with HU and EdU for 12 h. The cells were harvested, after which early DNA replication initiation and the transcription level of the CMIP gene were measured.
For MCM5 ChIP-qPCR, 10 million K562 cells were transfected with the indicated plasmids, incubated with 5 μM palbociclib immediately, and cultured for 48 h. The G1-arrested cells were harvested and ChIP-ed as described in the ChIP-seq section, except that the MNase treatment was not performed. The ChIP eluate fractions were subjected to RNA removal, decrosslinking, and purification. Next, the resuspended DNA was subjected to qPCR detection with the indicated primer sets (see Additional file 5: Table S4 for sequence details).
To block the elongation of RNA polymerase II on GALNT10, eight gRNAs (see Additional file 5: Table S4 for sequence details) targeting the second exon of GALNT10 were designed in a 160-bp region and cloned into two plasmids expressing dCas9. The two plasmids were transfected into K562 cells, simultaneously, and transfected cells were treated as described above. Next, the cells were harvested, after which early DNA replication initiation and the transcription level of GALNT10 were measured.
Flow cytometry analysis
GM12878 or K562 cells treated with or without palbociclib were labeled with 50 μM BrdU for 30 min and fixed by 4% PFA for 1 h at 4 °C. The BrdU pulse-labeled cells were denatured by 3 M HCl, incubated with anti-BrdU (× 100, BD) for 40 min, and then stained with 7-AAD (× 250, BD) for 20 min. For EdU/HU-treated samples, EdU-labeled cells were subjected to the Click reaction following the manufacturer’s instructions (Click-iT EdU Alexa Flour 488 Flow Cytometry Assay Kit). Samples were acquired on a BD FACSVerse and analyzed with FlowJo 10.4.
Immunofluorescence
Cells were labeled with EdU or BrdU as described in the aforementioned sections and fixed with 4% PFA for 15 min, followed by a wash in PBS. Cells were denatured using 3 M HCl and neutralized in 0.1 M Na2B4O7. To detect the cross-reaction between anti-BrdU and EdU (Additional file 1: Figure S1d), denatured cells were Click-reacted with TAMRA-azide-biotin or DMSO, blocked with BSA, and incubated with anti-BrdU (BU1/75), followed by incubation with the indicated secondary antibody. For cell cycle progression detection (Additional file 1: Figure S1b), denatured samples were blocked with 5% BSA, incubated with an antibody against BrdU (BU1/75) for 1 h followed by a thorough wash, and then incubated with the indicated secondary antibody. For γ-H2AX detection (Additional file 1: Figure S4c), G1-arrested K562 cells were released into fresh medium containing 10 mM HU plus 10 μM EdU with or without amanitin treatment for 12 h. The fixed cells were Click-reacted with Azide-Alexa-555 and then incubated with antibodies against γ-H2AX and RNA polymerase II simultaneously. For the early S phase samples used in the DRB removal assay, G1-arrested K562 cells were released into fresh medium supplied with 10 mM HU, 50 μM BrdU, and 30 μM DRB for 12 h. Next, the cells were washed and released into fresh medium containing 10 mM HU for 3.5 h. For the G1 phase samples used in the DRB removal assay, G1-arrested K562 cells were cultured in the presence of 30 μM DRB for 12 h and released into fresh medium containing 5 μM palbociclib for an additional 3.5 h. Next, the fixed cells were denatured and incubated with antibodies against BrdU or γ-H2AX. For the RNase H1 overexpression assay, cells were transfected with plasmids containing GFP or GFP-RNase H1 before cell arrest was induced via palbociclib. Images were acquired using an LSM 710 NLO & DuoScan System (× 63, 1.4 NA or × 40, 1.2 NA) and analyzed using ImageJ according to the instructions included with the software.
Quantification and statistical analysis
Sequencing data analysis
NAIL-seq data processing
R1 and R2 reads were stitched using PEAR [74] and the bridge adapter sequences were trimmed with custom scripts. The stitched reads were aligned to the human genome (assembly hg19) or mouse genome (assembly mm10) by BWA-MEM (-k 20). PCR duplicates were distinguished by random molecular barcodes (RMBs) and only one was kept. Reads with mapping quality (MAPQ) ≥ 30 were kept for further analysis. We used samtools to convert .sam files into .bam files.
ChIP-seq data processing
First, we trimmed the adapter sequences in the reads using cutadapt (-g NGGGGGGGGG -g GGGGG --times=2 --minimum-length=70 --error-rate=0.2 -O 3) [75]. Next, we aligned reads to the human genome (assembly hg19) using BWA-MEM (-k 20). For MCM ChIP-seq, we first generated RPKM to normalize the total reads by converting alignment files in .bam to .BigWig format using bamCoverage from deeptools [76] with parameters of -bs 1000 and --normalizeUsing RPKM. To compare with the control, the fold change of the RPKM signal between the MCM ChIP-seq and the input control was generated for further analysis. For ORC2 ChIP-seq, peaks were identified by MACS 1.4.2 using -p 1e-5 --nolambda --nomodel --keep-dup=all [77]. The peaks within 500 bp were merged using BEDTools 2.27.0 [78]. Peaks within 500 bp from two biological replicates without ⍺-amanitin treatment were defined as reproducible ORC2 peaks and used for further analysis. Genome-wide ORC2 tracks in .BigWig format were generated by bamCoverage from deeptools with -bs 50 and –normalizeUsing RPKM.
Identifying early replication initiation zones (ERIZs) from NAIL-seq
We developed a pipeline named “RepFind” for NAIL-seq analysis, which is described in the following section.
Defining E-B peaks from EdU/BrdU-labeled NAIL-seq samples
Both EdU and BrdU reads were normalized by reads per million (RPM) in 5-kb bins tiling the whole genome and marked as E and B, respectively. ΔEB was defined as: ΔEB=E-B. Bins, with ΔEB > 0.3, were taken as seeds and the neighbor bins with ΔEB > 0 were merged with the seeds to mark EdU-rich regions. The E-B peaks were the EdU-rich regions whose lengths were > 20 kb and fell in EdU peaks (defined from EdU reads using SICER with options -w 5000 -g 3). Peaks that occurred in at least two biological replicates were kept as bona fide E-B peaks.
Defining EdU/HU peaks
The peak calling procedure was modified as reported previously [15]. Peaks were called by MACS (version 1.4.2) (-p 1e-5 --nolambda --nomodel --keep-dup=all). Next, we filtered the peaks with ≥ 400-fold enrichment against a random Poisson distribution generated from MACS14. Neighbor peaks were merged if their interval was shorter than 10 kb for K562 or 20 kb for GM12878. The merged peaks with < 10 kb size were discarded. Peaks in ChrY, mitochondria DNA, and blacklist regions were omitted.
Defining ERIZs
The ERIZs were defined as EdU/HU peaks that overlapped with E-B peaks, while non-ERIZs were defined as EdU/HU peaks that did not overlap with E-B peaks. The replication timing value was obtained from wavelet-smoothed signals from six fractions of the ENCODE Repli-seq profile. The mean value of replication timing for each region was calculated by bwtool summary. The distribution was plotted by ggplot2 histogram. Domains with a replication timing value > 0.5 were defined as early replication domains, and other domains were classified as late replication domains (Additional file 1: Figure S2c).
Comparing OK-seq and SNS-seq with NAIL-seq
OK-seq data from K562 cells were analyzed using the BAMscale pipeline with default settings (https://github.com/ncbi/BAMscale/wiki/Detailed-Use:-OKseq-RFD-(Replication-Fork-Directionality)-Track-Generation). Replication initiation zones (IZs) were identified by using OKseq_switches.R with default settings (https://github.com/ncbi/BAMscale/wiki/Finding-OK-seq-strand-switched-from-the-RFD-track). Replication origins identified by SNS-seq were downloaded from GSE46189 (GSE46189_Ori-Peak.bed.gz). The overlaps between ERIZs/OK-seq and ERIZs/SNS-seq were calculated using intersect in Bedtools.
Visualization of sequencing data
To generate genome tracks, we converted the .bam files into .BigWig files. For NAIL-seq and ORC2 ChIP-seq, files were generated by bamCoverage (in deepTools) with the parameter --binSize 1000 --normalizeUsing RPKM. For MCM5 ChIP-seq, the fold change value of the ChIP-ed samples over the input was generated by bamCompare with parameter --binSize 1000, --normalizeUsing RPKM –operation ratio. The data were presented by IGV 2.4.14 [79].
All heatmaps were generated by the R package “EnrichedHeatmap” [80]. For the non-TR heatmaps (see the “GRO-seq analysis” section for region definition), the 20–100-kb non-TRs were centered at the midpoint and ordered by increasing width, and the ±100-kb regions from the midpoint were used for display. Each region was binned with a 1-kb window filled with the mean value of each signal. For early replication or ORC2 ChIP-seq, the signal was generated by RPKM. Specially for MCM5 ChIP-seq, the z-score-transformed fold change value was displayed. For the transcribed gene-related heatmaps, ERIZ-flanked transcribed regions (defined by TR in the “GRO-seq” analysis) larger than 50 kb were ranked by gene width, with the smallest at the top. For display, all transcribed regions were scaled to the same width and aligned at both TSS and TTS. Additional 50-kb regions upstream or downstream of the TSS or TTS, respectively, were shown. The signal processing was performed in the same manner as that performed for the non-TR-related heatmaps.
A/B compartment analysis
GM12878 and K562 HiC data (.hic file) were downloaded from GSE63525 [1]. Next, eigenvector from juicer [1] (KR BP 100000) was used to identify A/B compartments in GM12878 or K562 cells with 100-kb bins. The arbitrarily assigned “+” or “–” from eigenvector was corrected by defining “+” as an eigenvector positively correlated with an active histone marker (such as H3K27ac). The A compartments were defined as regions with a positive corrected value (“+”) for the eigenvector.
Prediction of ERIZ locations using epigenetic features
To predict ERIZ locations using epigenetic features within A compartments, we built a logistic regression model based on the peaks of the ENCODE ChIP-seq data (Fig. 2b; Additional file 1: Figure S3a). First, we converted the location of each ERIZ into a binary format using ChromHMM with 50-kb windows within an A compartment, where the window overlapping with the ERIZ is designated as “1” (“ERIZ-1”), and all others are designated as “0” (“ERIZ-0”) [81]. Second, to generate the epigenetic predictor matrix for ERIZs, we defined the enrichment score of each 50-kb window using the log odds ratio of the observed and expected occurrence of ChIP-seq peaks, where the observed occurrence was defined as the proportion of 1-kb bins that overlapped with ChIP-seq peaks. The expected occurrence was estimated as the mean occurrence of the random peaks shuffled within the same chromosome. Third, we subsampled bins from the “ERIZ-0” bins to balance the sample size of the “ERIZ-1” bins. Finally, we built the logistic regression model using “glm” in R. The coefficients were used to evaluate the contribution to the prediction of ERIZ location.
GRO-seq analysis
K562 and GM12878 GRO-seq data from a previous report were acquired for re-analysis [45]. For activated mouse B cells and mES V6.5 cells, GRO-seq data were acquired from published reports [15, 69]. Adapter-trimmed reads were aligned using bowtie (-l 25 -v 1 -k 1 -m 1 -S -q --best) [82]. Reads mapped to rRNA were omitted. For K562 and GM12878 cells, the read density (reads per kilobase, RPK) normalized to 10 million sequencing reads in the gene bodies or promoter regions was calculated via analyzeRepeats.pl from Homer (analyzeRepeats.pl rna hg19 -count genes -condenseGenes -strand + -norm 1e7) [83]. For activated mouse B cells and mES V6.5 cells, the read density was obtained from Homer (analyzeRepeats.pl rna mm10 -count genes -condenseGenes -strand + -norm 1e6).
Defining active genes
Active genes in K562 or GM12878 cells were defined as those with read density > 0 at the promoter region and RPK > 4 at the gene body from the GRO-seq data. Silent genes were defined as those with no read at the promoter region and RPK ≤ 1 at the gene body. Mouse active genes in activated B cells and mES V6.5 cells were defined as those with RPK > 1. Silent genes were genes with zero RPK from the GRO-seq data.
Defining non-transcribed regions (non-TRs) and transcribed regions (TRs)
The non-transcribed regions used in heatmaps were defined as interval regions between two active genes within A compartments via bedtools subtract. Non-transcribed regions with a size of 20–100 kb that overlapped with ERIZs were used for the non-TR-related heatmaps. The active genes flanked by ERIZ-occupied non-TRs were defined as transcribed regions (TRs) for display in heatmaps (Fig. 4f–h).
Comparing signal enrichment in non-transcribed regions and transcribed regions
The non-transcribed regions were further trimmed by 200 bp at the head and tail to exclude the TSS and TTS regions (Figs. 3b, d, f and 4c, e; Additional file 1: Figure S4f). The flanked transcribed regions were also trimmed as active gene body regions (TSS + 200 bp, TTS − 200 bp). The read density was calculated as follows: for each non-transcribed region, reads in the region were counted and then normalized by region length. Read counts in transcribed regions upstream or downstream of non-transcribed regions were summed and normalized by the summed length of the transcribed regions. In order to compare the signal enrichment of non-transcribed and transcribed regions based on NAIL-seq and ORC2 ChIP-seq, the log2 fold change of the ratio of the read density in the non-transcribed region to that of the transcribed region was calculated to allow quantification of differences. For MCM ChIP-seq, the fold enrichment of the read density relative to that of the input control was calculated and used to calculate the log2 fold change in the non-transcribed regions and transcribed regions.
Analysis of dCas9 as a transcription barrier
To compare the read density adjacent to the dCas9 binding sites, the RPK ratio was defined as follows: the read count per kilobase (RPK) was calculated in each bin and then divided by the mean RPK of the B compartments at the same chromosome to normalize the background of each sample. For the CMIP locus, the ± 5-kb region centered on the dCas9 binding site was tiled in 1-kb bins (with a 200-bp sliding) for calculation of the RPK ratio (Fig. 5b, d). A 2-kb bin upstream or downstream of the dCas9 binding site was used to calculate the RPK ratios shown in Fig. 5c, e. For the GALNT10 locus, the ± 6-kb region centered on the dCas9 binding sites was tiled in 3-kb bins (with a 300-bp sliding) (Additional file 1: Figure S6h).
Statistical analysis
We performed Student’s t test to test the statistical significance between RPK ratios for dCas9 with gRNA-CMIP and dCas9 with scrambled gRNA. The t-test was applied to assess the difference between the dCas9-mediated transcription blockade and control groups in the MCM5 ChIP-qPCR assay. The t-test was also used to analyze the distribution of γ-H2AX intensity with or without DRB treatment. The Wilcoxon rank-sum test was performed to analyze the differences between the log fold change NAIL-seq signal and ORC2 or MCM5 ChIP-seq signals in non-transcribed regions in comparison with transcribed regions among the groups subjected to different treatments with transcription inhibitors.