Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome

Fig. 1

Virtual ChIP-seq learns from association of gene expression and chromatin factor binding at each genomic bin. This example shows Virtual ChIP-seq analysis for the MYC TF. a Gene expression levels for 5000 genes in 12 cell types. For simplicity of visualization, we showed only 100 of these genes in the matrix and labeled only one quarter of the genes. We ranked RNA-seq RPKM expression values within each cell type. This matrix shows a subset of 5000 high-variance genes, sorted by variance of each gene’s expression between cell types. Blue: row minimum; white: median expression; red: row maximum. b ChIP-seq signal for 100 bp bins in 12 cell types, taken from four larger regions (25 bins each) on chromosome 5. We quantile-normalized ChIP signal from MACS software among cell types. This matrix shows a subset of the 54,037 bins on chromosome 5 which have TF binding in at least one training cell type. White: column minimum (0.0); black: column maximum (1.0). Cyan: a region in the NREP locus with MYC binding in GM12878; magenta: a region upstream of SLC22A4 with MYC binding in K562. c Association matrix: gene expression–ChIP signal correlation between 100 genomic bins and 5000 high-variance genes. This is a subset of the larger 54,037 ×5000 association matrix for chromosome 5. Each cell shows the Pearson correlation for 12 cell types between expression for a particular gene and ChIP signal at a particular genomic bin. Orange: negative correlation; white: p-value of Pearson correlation greater than 0.1 (NA); Purple: positive correlation. d (Top) Expression score plots for a 100 bp bin in the NREP locus. Each plot has one point for each of 184 genes with non-NA correlation values at that bin in the association matrix. Each point displays the rank of correlation value for that gene among one row of the association matrix against the rank of expression for that gene among 5000 high-variance genes in (left) GM12878 and (right) K562 cell types. The expression score at a bin for a cell type is Spearman’s rank correlation coefficient ρ between those two ranks. Blue line: best linear fit to data; grey region: 95% confidence interval of the fit. (Bottom) UCSC Genome Browser display of 550 bp around that region. Blue rectangle: MYC ChIP-seq peak in GM12878 or K562. Here, MYC binds only in GM12878. e Expression score plot and Genome Browser display for a 100 bp bin upstream of SLC22A4. Here, MYC binds only in K562

Back to article page