Skip to main content
Fig. 4 | Genome Biology

Fig. 4

From: Comprehensive enhancer-target gene assignments improve gene set level interpretation of genome-wide regulatory data

Fig. 4

External validations of our EnTDef ranking. A Violin plots of average F1 scores of different types of EnTDefs (top 10 EnTDefs with 5 kb locus definition [EnTDef.top_plus5kb], top10 EnTDefs, middle 10 EnTDefs [ranked at 732–741] and bottom 10 EnTDefs) in BENGI datasets with “ambiguous pairs” removed (“BENGI_fixedRatio”: the positive and negative pairs in 1:4 ratio, “BENGI_naturalRatio”: originally generated negative pairs). The values of the GSE F1 ranked top1 EnTDefs with or without 5 kb locus definition are annotated by a red star, and the values of baseline locus definitions (nearest TSS [nearest_tss] and >5 kb [5kb_outside]) were annotated by the dashed lines (red: nearest TSS, blue: >5 kb). ANOVA test was performed between top10 EnTDefs and nearest TSS in “BENGI_fixedRatio”: p-value = 2.74 × 10−104, and in “BENGI_naturalRatio”: p-value = 5.48 × 10−3. B The ranking comparison of the top 10 with 5 kb (red), top 10 (dark red), middle 10 (ranked at 732–741) (coral), bottom 10 (blue) EnTDefs, nearest TSS (orange) and >5 kb (magenta) based on GSE testing-derived F1 score (x-axis) vs. BENGI benchmarking-derived F1 score (y-axis). Pearson’s correlation (R), regression equation, and the p-values are shown for each type of BENGI benchmark datasets. C The ranking comparison of the top 10 (red), middle 10 (ranked at 732–741) (green), and bottom 10 (blue) EnTDefs based on average F1 score vs overlap coefficients as compared to five computationally derived enhancer-gene pair datasets (FOCS, GeneHancer, JEME, PEGASUS, and RIPPLE) D and two experimental datasets (HACEER, RB)

Back to article page