Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods

Fig. 1

A benchmark of candidate enhancer-gene interactions (BENGI). a Experimental datasets used to curate BENGI interactions categorized by 3D chromatin interactions, genetic interactions, and CRISPR/Cas9 perturbations. b Methods of generating cCRE-gene pairs (dashed straight lines in green, shaded green, or red) from experimentally determined interactions or perturbation links (dashed, shaded arcs in red, pink, or gold). Each cCRE-gene pair derived from 3D chromatin interactions (top panel) has a cCRE-ELS (yellow box) intersecting one anchor of a link, and the pair is classified depending on the other anchor of the link: for a positive pair (dashed green line), the other anchor overlaps one or more TSSs of just one gene; for an ambiguous pair (dashed line with gray shading), the other anchor overlaps the TSSs of multiple genes; for a negative pair (dashed red line), the other anchor does not overlap with a TSS. Each cCRE-gene pair derived from genetic interactions or perturbation links (middle and bottom panels) has a cCRE-ELS (yellow box) intersecting an eQTL SNP or a CRISPR-targeted region, and the pair is classified as positive (dashed green line) if the gene is an eQTL or crisprQTL gene, while all the pairs that this cCRE forms with non-eQTL genes that have a TSS within the distance cutoff are considered negative pairs (dashed red line). c To reduce potential false positives obtained from 3D interaction data, we implemented a filtering step to remove ambiguous pairs (gray box in b) that link cCREs-ELS to more than one gene. This filtering step was not required for assays that explicitly listed the linked gene (eQTLs and crisprQTLs). Additionally, for comparisons between BENGI datasets, we also curated matching sets of interactions with a fixed positive-to-negative ratio. Therefore, a total of four BENGI datasets were curated for each 3D chromatin experiment (A, B, C, D), and two were curated for each genetic interaction and CRISPR/Cas-9 perturbation experiment (A, B). d To avoid overfitting of machine-learning algorithms, all cCRE-gene pairs were assigned to cross-validation (CV) groups based on their chromosomal locations. Positive and negative pairs on the same chromosome were assigned to the same CV group, and chromosomes with complementary sizes were assigned to the same CV group so that the groups contained approximately the same number of pairs

Back to article page