Skip to main content

Advertisement

Fig. 2 | Genome Biology

Fig. 2

From: FOCS: a novel method for analyzing enhancer and gene activity patterns infers an extensive enhancer–promoter map

Fig. 2

Performance of three alternative regression methods for inferring E–P models. a Performance of ordinary least squares (OLS), generalized linear model with negative binomial distribution (GLM.NB), and zero-inflated negative binomial (ZINB) regression using the binary test. Point (x,y) on a plot indicates that a fraction x of the models had − log10[q-value] < y computed by Wilcoxon rank sum test. OLS yields a higher fraction of validated models at any q-value cutoff. b Same as a but using the activity level validation test, with p values computed by the Spearman correlation test. Here too, OLS yields a higher fraction of validated models than the other methods. c Number of promoters whose OLS models passed (at q < 0.1) each of the tests (or none). d The distribution of the number of positive samples (samples in which the promoter is active, i.e., has RPKM≥1) for promoters in each category. e Comparison between the R2 values with and without cross-validation (CV). Each dot is a promoter model. Blue dots denote models with R2 ≥ 0.5 and \( {R}_{CV}^2\ge 0.25 \). Red dots denote models with and R2 > 0.5 and \( {R}_{CV}^2<0.25 \) corresponding to over-fitted models with low predictive power on novel samples. f A promoter whose model as computed without CV has a very high R2 (left plot) but when CV is applied a low \( {R}_{CV}^2 \) is obtained (right plot). This example demonstrates the sensitivity of R2 (and Pearson correlation) to outliers. ρ s Spearman correlation, Q-value FDR-corrected p value

Back to article page