Skip to main content

Table 4 The VirStrain identification result of 32 real sequencing datasets (Continued)

From: VirStrain: a strain identification tool for RNA viruses

BioSample

Sequencing

Blast

VirStrain

Data

Running

Region

Region

Genome

accession number

platform

result

result

size

time

of sample

of clusters

in the DB

SAMN15144727

BGISEQ-500

Unknown

MT568634.1

32 MB

13s

Guangzhou, China

Guangzhou, China

N

SAMN15637956

Illumina HiSeq 4000

Unknown

MT066175.1

6.8 GB

114s

China

Guangzhou, China

N

SAMN16058334

NextSeq 500

Unknown

MT633030.1

8.2 GB

73s

Washington, USA

Washington, USA

N

SAMN16068353

NextSeq 500

Unknown

MT345882.1

320 MB

33s

Nevada, USA

Washington, USA

N

SAMN16068354

NextSeq 500

Unknown

MT641532.1

612 MB

16s

Nevada, USA

Washington, USA

N

SAMN15678404

NextSeq 500

Unknown

MT632835.1

974 MB

29s

Washington, USA

Washington, USA

N

SAMN15678405

NextSeq 500

Unknown

MT375468.1

539 MB

27s

Washington, USA

Washington, USA

N

SAMN14668182

Ion Torrent S5

Unknown

MT704132.1

24 MB

13s

New York, USA

Maryland, USA

N

  1. “Unknown” in the column “Blast result” means that the complete genome of that dataset is not available. “Region of clusters” is the output of VirStrain based on the metadata associated with the reference strains in each cluster. For clusters containing more than one reference strain, we use the majority vote to get the geographical region information. “Genome in the DB” represents whether the complete genome of that dataset can be found in the reference database of VirStrain, yes (Y) or no (N), and the red character means these samples have complete genomes. “DP cruise ship” refers to the Diamond Princess cruise ship. “v_rank” represents the ranking of the strain in the output of VirStrain. “blast_rank” represents the ranking of the strain in the output of Blast