Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: STARR-seq identifies active, chromatin-masked, and dormant enhancers in pluripotent mouse embryonic stem cells

Fig. 1

Comparison of genome-wide STARR-seq and active chromatin in mouse ESCs defines three classes of regulatory elements. a Experimental setup of genome-wide STARR-seq. ESC DNA is sonicated into random fragments with a median size of 850 bp. Adapter-ligated fragments are cloned behind the SCP minimal promoter and transfected into ESCs. STARR-seq signal represents the number of transcribed fragments divided by sequenced input DNA in a “peak” region. Bayesian shrinkage was applied to penalize the signal of loci with low read counts (see methods). b Genomic distribution of the significant STARR-seq peaks (enrichment ≥ 3, p < 0.05) detected in 2iL or SL. 18,116 enhancers were detected in 2iL and 18,543 in SL. 7073 enhancers were only detected in 2iL and 7500 enhancers were only detected in SL. The union of STARR-seq peaks detected in 2iL- and SL comprises 25,616 enhancer loci. c Top: Putative enhancers were defined as the intersection of ATAC-seq, P300, and H3K27ac ChIP-seq peaks. For each mark, the peaks found in the union of 2iL- and SL-ESCs were taken. The union of all the APK loci (present in 2iL or SL) and all the STARR-seq peaks (2iL or SL) were classified by EpiCSeg (see methods). Bottom: We defined loci as C1: STARR-seq and active chromatin, C2: only STARR-seq or C3: active chromatin, but no STARR-seq. C3-loci near a Gencode or Refseq TSS were discarded. [PD]. d Heatmap of STARR-seq and enhancer marks in 2iL- and SL-ESCs. Signal was computed on the STARR- (C1 and C2) or ATAC-seq peak (C3) flanked by 3 kb. Regions were clustered by class and ranked by decreasing STARR-seq signal (log2 RPKM) in 2iL. The signal intensity (log2 RPKM) was capped at 75% of the maximum value to enhance visualization. [PD]. e Genome browser view for selected C1- (top), C2- (middle), and C3- (bottom) loci. The STARR-seq track depicts the enrichment over input. For the other tracks, the signal is shown in RPKM. Orange boxes denote luciferase regions (see Table S3 for primers and locations). [PD]. f STARR-seq and luciferase signal for regions shown in e. Luciferase signal is defined as the Firefly over Renilla (F/R) scaled to the F/R value of an empty vector. These values were log2 transformed and linearly scaled to STARR-seq log2 enrichment values (see the “Methods” section). Error bars denote the standard deviation for biological duplicates (STARR-seq) or technical triplicates (luciferase). g STARR-seq enrichment and luciferase signal (as in f) for n = 39 selected loci in 2iL- and SL-ESCs. Points denote the mean of biological duplicates (STARR-seq) or technical triplicates (luciferase). PCC: Pearson’s correlation coefficient. h Enrichment of known DNA motifs at C1-, C2-, and C3-loci relative to a GC%-, size-, and input-matched background set (see the “Methods” section). The bars (top) depict the number of STARR-seq peaks detected per class. Both the class definition and the STARR-seq peaks are condition-specific (2iL or SL). TFs with similar motifs were grouped. P values: Homer2 binomial test with Benjamini-Hochberg correction. Some of the panels in these figures contain public data. These panels are annotated with [PD]. The accession numbers of public data and their corresponding panels are annotated in Additional file 2: Table S1

Back to article page