Re-assembly of nineteenth-century smallpox vaccine genomes reveals the contemporaneous use of horsepox and horsepox-related viruses in the USA

According to a recent article published in Genome Biology, Duggan and coworkers sequenced and partially assembled five genomes of smallpox vaccines from the nineteenth century. No information regarding the ends of genomes was presented, and they are important to understand the evolutionary relationship of the different smallpox vaccine genomes during the centuries. We re-assembled the genomes, which include the largest genomes in the vaccinia lineage and one true horsepox strain. Moreover, the assemblies reveal a diverse genetic structure in the genome ends. Our data emphasize the concurrent use of horsepox and horsepox-related viruses as the smallpox vaccine in the nineteenth century.


To the Editor
It is still a mystery which virus early vaccinators and vaccine manufacturers used as the smallpox vaccine in the nineteenth century, whether it was cowpox (CPXV), horsepox (HSPV), or vaccinia virus (VACV). Edward Jenner, who developed the first smallpox vaccine in 1796, supposedly used cowpox lymph but historical evidence accounts for the use of horsepox lymph on several occasions, including his first immunization experiments [1][2][3]. In fact, CPXV has never been detected molecularly in any smallpox vaccine. However, an HSPV-related virus has recently been described as the smallpox vaccine seed used by the Mulford Laboratories in the USA in 1902 [4].
The Mulford 1902 genome is > 99.7% similar to the central conserved region of the HSPV-MNR-76 genome. However, it differs in the variable flanking regions, mainly by the presence of two deletions of 10.7 kb and 5.5 kb in the left and right genome ends, respectively, which are a hallmark of all known VACV strains [4,5]. Therefore, the analysis of the whole genome structure is essential to understand the genetic makeup of old smallpox vaccines [6].
In a recent Genome Biology article, Duggan and colleagues described the partial genomic sequences of five American smallpox vaccines from the mid to late nineteenth century [7]. Phylogenetic analyses revealed that the viruses are closely related to HSPV and to the Mulford 1902 strain. However, the only genome assembled de novo (VK1) has 184,677 bp and lacks nearly 20,000 bp of the left end. Because the right end is complete, we hypothesized that reads covering the left end should also be available.
Therefore, FastQ files were downloaded from Sequence Read Archive (PRJNA561155) and trimmed (Trimmomatic-v0.39, Phred-33 quality score) [8]. Full genomes were assembled by an iterative workflow: de novo assembly of adapterremoved reads by using Spades v3.13.1 (Phred offset-33, standard parameters) [9], mapping of the trimmed reads to the contigs to increase contig size, visual screening for accuracy, and correction of mis-assembled regions with Geneious Prime Fig. 1 Phylogenetic inference of the old smallpox vaccines VK01, VK02, VK05, VK08, and VK12. The multialignment of 37 orthopoxvirus genomes, including the VK samples, was used as input for tree construction by using MEGA 6, opting for the maximum likelihood method based on the Tamura-Nei substitution model, Uniform rates model with 1000 bootstrap replicates. Numbers indicate the percentage of bootstrap support from 1000 replicates (> 50% is shown). The scale bar indicates the number of substitutions per site. The VACV clusters are indicated on the right. A similar tree topology was obtained by using the neighbor-joining method. GenBank accession numbers are indicated in the "Availability of data and materials" section 2020.0.5. The final genomes were validated for accuracy by mapping with all reads and screened for inconsistency in the continuous assembly. Inverted terminal repeat (ITR) regions were identified with Geneious Prime Repeat Finder. Genomes were annotated by using Genome Annotation Transfer Utility (GATU) [10] and CLC Main Workbench v8.0, followed by visual screening [4,6]. Orthopoxvirus sequences were aligned by using Mafft Server v7 [11] and used for phylogenetic inference by using Mega v6 [12].
All five re-assembled genomes are phylogenetically clustered within the HSPV subgroup of the VACV lineage ( Fig. 1), confirming the findings of Duggan and colleagues [7]. However, our data provides important genetic information that was not revealed by the published assembly. We observed genomes of different sizes and number of ORFs and, interestingly, with distinct structures in the left and right ends. Table 1 summarizes our findings and Fig. 2 shows the genome structure of the left and right ends of the VK genomes. VK01 and VK12 have the largest genomes in the VACV lineage with 214,388 bp and 219,647 bp (Table 1), respectively, mainly due to the presence of unique insertions of 14.2 kb and 15.8 kb in the left end, probably resulting from a non-tandem duplication of an equivalent region in the right end of the genome and the insertion of cowpox gene orthologs (Fig. 2a,  insert).
Interestingly, the 10.7-kb and the 5.5-kb deletions found, respectively, in the left and right ends of the genomes of all VACV strains [5] as well as in the Mulford 1902 strain [4] are also found partially or completely in the VK01 and VK8 The deletions correspond to 10.7 kb and 5.5 kb stretches of DNA present in HSPV-MR76, but absent in all VACV strains and in the Mulford 1902 [4,5] d Horsepox virus strain MNR-76 was included for the sake of comparison [5]. GenBank accession numbers are indicated in the "Availability of data and materials" section genomes in the left and right ends, respectively. However, those deletions are not found in VK5, VK12 (only the right deletion is found), and VK2 (only the left deletion is found). In fact, the VK05 genome has the same genome structure (Fig. 2) and the highest identity to HSPV-MNR-76 across the whole genome, representing a true HSPV strain (Table 1). So far, MNR-76, isolated from Mongolian horses in 1976, and MNR, a synthetic recombinant horsepox virus, are the only extant strains of HSPV [5,13]. VK08 genome is very similar to VK01, except for the absence of the 14.2-kb insertion (Fig. 2a, insert). VK02 genome has a 15-kb deletion near the very left end of the genome (Fig. 2a), resulting in the shortest ITRs in the VACV lineage (Table 1).
In conclusion, the re-assembly of the five VK genomes exposes the complex genetic diversity of the old smallpox vaccine genomes. We present evidence of the contemporaneous use of HSPV and HSPV-related viruses as the smallpox vaccine in the nineteenth century. The results also reveal that HSPV-related vaccines had been used in the USA at least 36 years before the Mulford 1902 strain. In the nineteenth century, vaccine seeds were constantly imported from Europe for smallpox vaccine production in the USA. Therefore, it is likely that HSPV and HSPV-related viruses were repeatedly introduced in the USA at that time and that similar vaccines were also manufactured and used in Europe in the nineteenth century [14].