Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers

Fig. 2

Enformer provides effective gene expression prediction for endogenous genes. A Pearson correlation between predicted and measured log-transformed expression on GTEx tissues for different models. Enformer can predict endogenous RNA abundance, as measured in adult tissues (GTEx [15]), better than previous models. Adding the exon-intron ratio, a (weak) proxy of RNA half-life, as an additional predictor slightly improves performance. B Same as A for Enformer predictions on developmental samples (Cardoso-Moreira et al. [16] dataset). Enformer predicts endogenous gene expression very well overall yet somewhat worse for later stages of development. C Distribution of deviations of GTEx measured log expression values from (1) the global mean (across genes and tissues, blue), (2) the gene mean (across tissues, red), and (3) the Enformer prediction (green). The first indicates overall variation in expression, the second indicates between-tissue variation, and the third indicates the magnitude of errors of Enformer. Enformer accuracy is sufficient to explain much of the between-gene variation but not for the variation of genes between tissues. D Measured between-tissue deviations of gene expression against prediction. Enformer predicts large between-tissue changes in expression reasonably well on average, but there is significant room for improvement. The numbers indicate the percentages of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) when predicting 2-fold changes (black lines)

Back to article page