Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Cross-protein transfer learning substantially improves disease variant prediction

Fig. 3

Training on DMS is important for CPT-1 performance. A We compared CPT-1 performance to several baselines that do not fully use the DMS data. These baselines were as follows: averaging EVE and ESM-1v, averaging random features (set to the correct sign), and averaging features selected by feature selection. CPT-1 outperforms these baselines, especially in the high-sensitivity regime. This demonstrates the value of a full training procedure on DMS data. B We examined the dependence of CPT-1 performance on the number of training genes used. Each dot indicates a specific choice of training genes, with the mean shown as a black horizontal bar. More training genes always increases average performance, but there is significant variance and performance increases appear to be saturating. We also examined the use of additional, more heterogeneous datasets from ProteinGym, finding that this did not increase performance (Additional file 1: Fig. S2)

Back to article page