Skip to main content
Fig. 4 | Genome Biology

Fig. 4

From: MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect

Fig. 4

Analysis of DMS data for A β and TDP-43. a, b Seuma et al. [9] measured nucleation scores for 499 single mutants and 15,567 double mutants of A β. These data were used to train a latent phenotype model comprising a an additive G-P map and b a GE measurement process with a heteroscedastic skewed-t noise model. c, d Bolognesi et al. [10] measured toxicity scores for 1266 single mutants and 56,730 double mutants of TDP-43. The resulting data were used to train c an additive G-P map and d a GE measurement process of the same form as in panel b. In both cases, data were split 90:5:5 into training, validation, and test sets. In a, c, gray dots indicate the wildtype sequence and * indicates a stop codon. White squares [355/882 (40.2%) for A β; 433/1764 (24.5%) for TDP-43] indicate residues that were not observed in the training set and thus could not be assigned values for their additive effects. Amino acids are ordered as in the original publications [9, 10]. In b, d, blue dots indicate latent phenotype values plotted against measurements for held-out test data. Gray line indicates the latent phenotype value of the wildtype sequence. Solid orange line indicates the GE nonlinearity, and dotted orange lines indicate a corresponding 95% PI for the inferred noise model. Values for Ivar, Ipre, and R2 (between y and \(\hat {y}\)) are also shown. Uncertainties reflect standard errors. Additional file 1: Fig. S2 shows measurements plotted against the \(\hat {y}\) predictions of these models. DMS: deep mutational scanning; A β: amyloid beta; TDP-43: TAR DNA-binding protein 43; G-P: genotype-phenotype; GE: global epistasis; PI: prediction interval

Back to article page