Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Potentially adaptive SARS-CoV-2 mutations discovered with novel spatiotemporal and explainable AI models

Fig. 2

Genealogy and success model of SARS-CoV-2 haplotypes. a) Median-joining network of 13,979 full length sequences (haplotypes < 0.05% were removed). Nodes are haplotypes and edges are mutational events. Node size is proportional to the number of individuals. Red gradient in the center of a node indicates the date of emergence (light red haplotype of the Wuhan reference sequence is indicated). Node perimeter darkness reflects the success of a haplotype based on number of days, number of regions, and number of individuals from which it was sampled. Dark perimeter, small diameter nodes indicate haplotypes that persisted globally for long periods but did not expand into many individuals (unsuccessful). Diamonds denote individuals with an amino acid change in the serine/arginine rich region of the N protein (see text). Pie charts indicate geographic distribution of the major nodes. Measures of mutability are given for the three major clades as mutations per day and mutations per individual and dN/dS is provided for each major clade (see text). Exclamation point signifies back mutation to reference sequence. b) Alignment of the hyper-mutable region at the signal peptide sequence of S is shown in the upper right. The conserved string of phenylalanine, leucine, and valine residues results in the T-rich region of the signal peptide at the nucleotide level and three runs of the repeat sequence “GTTTT”, which could be responsible for the hyper-mutation. Haplotypes that are linked to individuals with the hypermutable site are shown with a pink asterisk in A (nodes for the haplotypes with hyper-mutation not shown due to low frequency, see “Methods”)

Back to article page