Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect

Fig. 2

MAVE-NN quantitative modeling strategy. a Structure of latent phenotype models. A deterministic G-P map f(x) maps each sequence x to a latent phenotype ϕ, after which a probabilistic measurement process p(y|ϕ) generates a corresponding measurement y. b Example of an MPA measurement process inferred from the sort-seq MPRA data of Kinney et al. [12]. MPA measurement processes are used when y values are discrete. c Structure of a GE regression model, which is used when y is continuous. A GE measurement process assumes that the mode of p(y|ϕ), called the prediction \(\hat {y}\), is given by a nonlinear function g(ϕ), and the scatter about this mode is described by a noise model \(p\left (y|\hat {y}\right)\). d Example of a GE measurement process inferred from the DMS data of Olson et al. [8]. Shown are the nonlinearity, the 68% PI, and the 95% PI. e Information-theoretic quantities used to assess model performance. Intrinsic information, Iint, is the mutual information between sequences x and measurements y. Predictive information, Ipre, is the mutual information between measurements y and the latent phenotype values ϕ assigned by a G-P map. Variational information, Ivar, is a linear transformation of the log likelihood of a full latent phenotype model. The model performance inequality, Iint≥Ipre≥Ivar, always holds on test data (modulo finite data uncertainties), with Iint=Ipre when the G-P map is correct, and Ipre=Ivar when the measurement process correctly describes the distribution of y conditioned on ϕ. G-P: genotype-phenotype; MPA: measurement process agnostic; MPRA: massively parallel reporter assay; GE: global epistasis; DMS: deep mutational scanning; PI: prediction interval

Back to article page