Skip to main content
Fig. 4 | Genome Biology

Fig. 4

From: Cross-protein transfer learning substantially improves disease variant prediction

Fig. 4

Vertebrate alignments are key to improved performance and a powerful baseline. A Specificity in the clinically relevant high-sensitivity regime on ClinVar missense variants. Removing vertebrate alignments from CPT-1 significantly decreases the margin of improvement over baseline. Conservation among 100 vertebrates is a powerful single feature baseline and is competitive with much more complex models in the high-sensitivity regime. Vertebrate alignments are much less powerful in the high specificity regime (Additional file 1: Table S2). B If a missense variant from ClinVar appears in a vertebrate alignment, it is highly likely to be benign. Of the variants that do not occur in any of our studied vertebrates, 39% are benign. Of the variants that occur in a vertebrate, 91% are benign. Of the variants that occur in a mammal (subset of vertebrates), 97% are benign. This signal is not fully leveraged by EVE and ESM-1v due to the sequence redundancy filtering that is employed by both methods and is key to the improved performance of CPT-1

Back to article page