Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Cross-protein transfer learning substantially improves disease variant prediction

Fig. 2

CPT-1 achieves state-of-the-art performance on clinical variant and functional assay prediction. A Receiver-operating characteristic (ROC) curves for ESM-1v, EVE, and our transfer model CPT-1 on annotated missense variants in ClinVar. CPT-1 improves the true positive rate at all false positive rates over both baselines and has a significantly higher AUROC. B Specificity in the clinically relevant high-sensitivity regime on ClinVar missense variants. When all models are constrained to recall nearly all pathogenic variants, CPT-1 improves on EVE and ESM-1v by large margins. C Per-gene AUROC on ClinVar missense variants in 886 genes with at least four benign and four pathogenic variants. Interquartile range and median are shown in black; the mean is shown in white. CPT-1 improves or equals the per-gene AUROC on 72% of genes for EVE and 79% of genes for ESM-1v. D CPT-1 outperforms REVEL on proteins that were not trained on by REVEL, demonstrating the value of developing predictors with cross-protein transfer in mind. E We trained regression versions of CPT-1 to predict functional assays (Methods). We show Spearman’s \(\rho\) on DMS datasets of human proteins from ProteinGym (full details in Additional file 1: Table S3). The left plot compares CPT-1 to EVE, and the right compares CPT-1 to ESM-1v. In each plot, points above the diagonal line indicate a gene where CPT-1 outperforms the baseline. With the test protein held out in all cases, CPT-1 outperforms EVE on 16 out of 18 proteins and outperforms ESM-1v on 15 out of 18

Back to article page